Thursday, October 28, 2010

Scraping Made Easy

Found this cool tool, pykhtml. It is a python library that allows you to programatically access the DOM of a a web page -- the best part it is that it understands javascript which most text browsers (lynx, curl, wget) can't even begin to understand.

Great stuff. This is going to be VERY useful.

No comments: