03.22.06

Scraping data from similar tables

Posted in Tips at 5:52 pm by Todd Wilson

Astute screen-scraper Fred came up with a scenario that arises from time-to-time: you’ve got a page containing one or more HTML tables, all of which are nearly identical in structure. You want to pull the data from each table, but need to be able to distinguish which row came from which table. Standard old extractor patterns won’t do the job–they’ll match every row in every table, which destroys the link between each row and its corresponding table.

Fortunately, there are a couple of ways of handling such a scenario, which I’ve just outlined in this FAQ. Not too complicated, but a bit more involved than just using a standard extractor pattern.

del.icio.us:Scraping data from similar tables digg:Scraping data from similar tables spurl:Scraping data from similar tables wists:Scraping data from similar tables simpy:Scraping data from similar tables newsvine:Scraping data from similar tables blinklist:Scraping data from similar tables furl:Scraping data from similar tables reddit:Scraping data from similar tables fark:Scraping data from similar tables blogmarks:Scraping data from similar tables Y!:Scraping data from similar tables smarking:Scraping data from similar tables magnolia:Scraping data from similar tables segnalo:Scraping data from similar tables

Leave a Comment