Official Reference Website   Michael Schrenk

  No Starch Press,
  San Francisco

home chapter list downloads target addresses answers & updates purchase author contact
   
  Libraries
The code libraries used by this book are governed by the W3C Software Notice and License .
Library Purpose Chapters (where referenced)
LIB_download_images.php Binary-safe downloads, Directory preparation, Downloading all images for a specific web page 8, 18  
LIB_exclusion_list.php An example of an exclusion list, used by a spider 18  
LIB_http.php PHP/CURL routines for downloading web pages and automating form submission 3, 4, 5, 6, 7, 9, 11, 12, 17, 18, 19,
20, 21, 22, 24, 25, 26, Appendix B  
LIB_http_codes.php An array that defines http status codes 9  
LIB_mail.php Various routines for reading email form POP3 Mail Servers 13, 16  
LIB_mysql.php A general purpose MySQL interface 6, 16 
LIB_nntp.php A general purpose NNTP (newsgroup) interface 14 
LIB_parse.php A collection of parsing routines 4, 7, 9, 12, 15, 17, 18, 19, 21, 25, 26  
LIB_pop3.php A collection of routines that connect to a POP3 Mail Server to send email 15, 23  
LIB_resolve_addresses.php Contains a variety of routines that resolve addresses, determine domains and "page base" addresses 9, 18, 25  
LIB_rss.php A library used with the webbot aggregation project. Contains RSS parsing routines 12  
LIB_simple_spider.php Various routines that harvest, exclude and archive links, resolve domains 18  
LIB_thumbnail.php Defines a function that creates thumbnail images 6  
Download Libraries
All of the book's libraries are conatined in this zip file: WebbotsSpidersScreenScraper_Libraries_REV2_0.zip
You can get your copy of the file by clicking here
   

  Example Scripts
The example scripts (used in the book) are covered by the W3C Software Notice and License .
NOTE: THESE SCRIPTS ARE FOR DEMONSTRATION PURPOSES ONLY! They are not suitable for any use other than demonstrating the concepts presented in Webbots, Spiders and Screen Scrapers. Do not use these scripts in a production environment where reliability is a priority.
Download Example Scripts
These scripts are individually downloadable by clicking on the script names.
Please note that small, easy-to-enter scripts are not available for download. This may change, depending on demand.
Script Version Chapter Comment Libraries Required
LISTING_3_1.php 1.0   3   (Very) simple web page download using PHP's fopen( ) and fgets( ) functions n/a 
LISTING_3_2.php 1.0   3   (Very) simple web page download using PHP's file( ) function n/a 
LISTING_3_6.php 1.0   3   Web page download script using LIB_http LIB_http 
LISTING_4_2.php 1.0   3   LIB_parse demo: Using split_string( ). LIB_parse
LIB_http 
LISTING_4_4.php 1.0   4   LIB_parse demo: Using return_between( ) to parse the page title from a web page LIB_parse 
LISTING_4_6.php 1.0   4   LIB_parse demo: Using parse_array() to parse meta tags from www.fbi.gov LIB_parse
LIB_http 
LISTING_5_10.php 1.0   5   Form Analysis Script (to be run on a web server) n/a 
LISTING_6_12.php 1.0   6   Demo of image thumbnail creation script LIB_http
LIB_thumbnail 
price_monitoring_bot.php 1.0   7   Download and parse prices from sample store website LIB_http
LIB_parse 
image_capture_bot.php 1.0   8   Download images in a duplicate directory structure LIB_download_images,
 which includes:
  LIB_http
  LIB_parse
  LIB_resolve_addresses  
link_verification_bot.php 1.0   9   Validate links on a web page LIB_http
LIB_parse.php
LIB_resolve_addresses
LIB_http_codes.php 
anonymous_browsing_proxy.php 1.0   10   Creates a (simplified) anonymous browing environment proxy
NOTICE, RUN THIS SCRIPT ON A WEB SERVER
Read Chapter 10 for additional information.
LIB_http
LIB_parse.php
LIB_resolve_addresses 
LISTING_11_1.php 1.0   11   An example webbot that calculates search engine rankings LIB_http
LIB_parse.php 
aggregation_bot.php 1.0   12   An example of a simple aggregation webbot using RSS feeds LIB_http
LIB_parse.php
LIB_rss.php 
ftp_bot.php 1.0   13   A simple demonstration of using FTP.
NOTICE, REQUIRES ACCESS TO FTP SERVERS & ADDITIONAL CONFIGURATION
Read Chapter 13 for additional information.
n/a 
LISTING_14_1.php 1.0   14   Download news group names from your NNTP news server
NOTICE, REQUIRES ACCESS TO FTP SERVERS & ADDITIONAL CONFIGURATION
This script may take several minutes to complete. For debugging purposes, you might want to uncomment the diagnostic line in read_nntp_buffer( ) in LIB_nntp.
Read Chapter 14 for additional information.
LIB_nntp 
LISTING_14_4.php 1.0   14   Download news group names from your NNTP news server
NOTICE, REQUIRES ACCESS TO FTP SERVERS & ADDITIONAL CONFIGURATION
Read Chapter 14 for additional information.
LIB_nntp 
LISTING_14_6.php 1.0   14   Download news group names from your NNTP news server
NOTICE, REQUIRES ACCESS TO FTP SERVERS & ADDITIONAL CONFIGURATION
Please note that you should substitue $article with the identifier of the article you want to access
Read Chapter 14 for additional information.
LIB_nntp 
email_reading_bot.php 1.0   15   Download email from a POP3 mail server
NOTICE, REQUIRES ACCESS TO A POP3 MAIL SERVER & ADDITIONAL CONFIGURATION
Read Chapter 15 for additional information.
LIB_POP3
LIB_parse 
website_to_function_bot.php 1.0   17   Example of how to turn a website into a PHP function n/a 
simple_spider.php 1.0   18   Simple spider project that downloads images from a website. LIB_http.php
LIB_parse.php
LIB_resolve_addresses.php
LIB_exclusion_list.php
LIB_simple_spider.php
LIB_download_images.php 
LISTING_21_1.php 1.0   21   Webbot that auto-authenticates a website that uses BASIC authentication n/a 
LISTING_21_3.php 1.0   21   Webbot that auto-authenticates a website that uses cookie authentication LIB_http
LIB_parse 
LISTING_21_4.php 1.0   21   Webbot that auto-authenticates a website that uses query authentication LIB_http
LIB_parse 
LISTING_25_5.php 1.0   25   Example of detecting "meta tag" redirection LIB_http
LIB_parse
LIB_resolve_addresses 
LISTING_25_9.php 1.0   25   Fault tolerant parsing of form values LIB_http
LIB_parse
LIB_resolve_addresses 
LISTING_26_8.php 1.0   26   Example of downloading and parsing an XML file LIB_http
LIB_parse 
LISTING_26_12.php 1.0   26   Example of webbot using a light-weight data exchange interface LIB_http 
Please contact me if there's another script from the book that you'd like to have.
home chapter list downloads target addresses updates purchase author contact
Copyright 2024, Michael Schrenk