I've been fortunate in being able to speak at DEFCON seven times (including
the upcoming DEFCON XXIII). I've also covered DEFCON for Computer World Magazine.
DEFCON XVII, Las Vegas NV, Aug 2010 “Screen Scraper Tricks, Difficult Cases”
Screen scrapers and data mining bots often encounter problems when extracting data from modern websites. Obstacles like AJAX discourage many bot writers from completing screen scraping projects. The good news is that you can overcome most challenges if you learn a few tricks.
This session describes the (sometimes mind numbing) roadblocks that can come between you and your ability to apply a screen scraper to a website. You'll discover simple techniques for extracting data from websites that freely employ DHTML, AJAX, complex cookie management as well as other techniques. Additionally, you will also learn how "agencies" create large scale CAPTCHA solutions.
All the tools discussed in this talk are available for free, offer complete customization and run on multiple platforms.
Quoted from the DEFCON XV program