Official Reference Website   Michael Schrenk

  No Starch Press,
  San Francisco

home chapter list downloads target addresses answers & updates purchase author contact
   
  What's Inside? (Chapter List)
Webbots, Spiders, and Screen Scrapers is designed to not only teach you how to write webbots and spiders, but also why to write these automated agents. As you will learn, there's more to writing webbots that downloading and parsing web pages.

Chapter NameHighlights
Introduction In the intorduction, you'll learnhow I started writing webbots and spiders in 1996, what to expect from the book, tools you'll need (all open source) and coding standards.
Part I: Fundamental Concepts and Techniques
#1 What's in It for You? Describes webbots can uncover the Internet's true potential
#2 Ideas for Webbot Projects Where do ideas for webbots come from? Read a sample chapter at the No Starch Press website.
#3 Downloading Web Pages Explores techniques for downloading web pages with PHP built-in functions and PHP/CURL
#4 Parsing Techniques Teaches how to effectively parse data from web pages.
#5 Automating Form Submission Explains how to write webbots that automatically fill out forms and upload data to remote web servers
#6 Managing Large Amounts of Data Describes how to organize and store large amounts of data with compression, tag removal and thumbnailing
Part II: Projects
#7 Price-Monitoring Webbots Shows how to write webbots that monitor prices at online stores
#8 Image-Capturing Webbots Describes a project that downloads all the images from a web page
#9 Link-Verification Webbots Explores a project that verifies all the links on a web page
#10 Anonymous Browsing Webbots Introduces a conceptual project for using webbots to create an anonymous browsing environment
#11 Search-Ranking Webbots Explores a webbot that determines the search engine ranking of a web page
#12 Aggregation Webbots Explains how to write webbots that combine information from multiple resources, including RSS feeds
#13 FTP Webbots Explains how webbots can use FTP as an online resource
#14 NNTP News Webbots Explains what NNTP news groups are and how webbots access them
#15 Webbots That Read Email Describes methods webbots can use to read email from POP3 Mail Servers
#16 Webbots That Send Email Explores methods webbots can use to send email to SMTP Mail Servers
#17 Converting a Website into a Function Identifies ways to convert an online service into a PHP function your webbots can call
Part III: Advanced Technical Considerations
#18 Spiders A study of spider theory, with a simple spider project
#19 Procurement Webbots and Snipers Explores how webbots automatically buy things from online stores and how snipers bid on online auctions.
#20 Webbots and Cryptography Learn how to communicate with websites that use encryption.
#21 Authentication Discover various authentication methods and how webbots can auto authenticate into various websites.
#22 Advanced Cookie Management Master reading and writing cookies with webbots.
#23 Scheduling Webbots and Spiders Learn how to make webbots and spiders launch and run automatically.
Part IV: Larger Considerations
#24 Designing Stealthy Webbots and Spiders Learn when and why its important for your webbots to run without detection. Then learn how to achieve stealth with your webbots.
#25 Writing Fault-Tolerant Webbots Discover how to write webbots and parse routines that are "less affected" by changes to the web pages you target.
#26 Designing Webbot-Friendly Websites Master Search Engine Optimization as well as methods for communicating data with websites, including light-weight interfaces and SOAP
#27 Killing Spiders Gain an understanding of techniques web developers use to discourage the use of automated browsing agents.
#28 Keeping Webbots out of Trouble Uncover the dangers of writing disreputable webbots and spiders
Appendixes
A PHP/CURL Reference A handy reference for using PHP/CURL
B Status Codes A list of HTTP and NNTP status codes
C SMS Email Addresses Address and tips for sending text messages through email

 
home chapter list downloads target addresses updates purchase author contact
Copyright 2024, Michael Schrenk