Tips Archives - Page 3 of 3 - screen-scrapeable

How to stop phpBB spam

January 9, 2007January 2, 2007 by Todd Wilson

Well, I sure wish someone would have told us about this a while ago, so I’m doing the world a favor and talking about it here. Hopefully this blog posting gets picked up by Google so that others who are new to phpBB can learn how to stop spam up front. We’ve been battling spam … Read moreHow to stop phpBB spam

Scraping CAPTCHA forms (you know, those HTML forms with the wavy text)

October 18, 2006 by Todd Wilson

Alert screen-scraper yipa posted an excellent question to our forum this morning: One of the pages I want to scrape is behind a login with image verification (i.e., you need to enter some text generated in an image to log in). Is there a way to work around this? Maybe something like SS load the … Read moreScraping CAPTCHA forms (you know, those HTML forms with the wavy text)

Scraping a Date Range

May 14, 2008October 12, 2006 by jason

Much of the time in scraping, one wants to fill in a web form and grab the results, and many of the forms want the user to fill in a date range. It’s not a daunting prospect if you just want to scrape the form once, but for jobs where you want run a scrape … Read moreScraping a Date Range

Extracting data from PDF files

August 2, 2006 by Todd Wilson

Periodically people ask if screen-scraper can extract data from PDF files, as well as HTML. We’ve never had a very good answer for this (it can’t, out of the box), but lately we’ve been forced to come up with a solution, as a project we’ve been working on has required it. When I initially researched … Read moreExtracting data from PDF files

Extracting data from Java applets, ActiveX controls, and Adobe Flash movies

July 31, 2006 by Todd Wilson

This is a question we get from time to time, so I finally decided to add it to our FAQ. If anyone else has experience with this kind of thing feel free to post a comment. I’m unaware of many packages that can do this. Here’s the posting from the FAQ: The short answer to … Read moreExtracting data from Java applets, ActiveX controls, and Adobe Flash movies

Scraping data from similar tables

March 22, 2006 by Todd Wilson

Astute screen-scraper Fred came up with a scenario that arises from time-to-time: you’ve got a page containing one or more HTML tables, all of which are nearly identical in structure. You want to pull the data from each table, but need to be able to distinguish which row came from which table. Standard old extractor … Read moreScraping data from similar tables

Adding numbers to session variables

March 7, 2006 by Todd Wilson

Up till now it’s been a pretty big pain to add a number to a session variable. Oftentimes you’ll have something like a page number that you need to increment as you loop through search results pages. The page number is usually stored as a String, and to increment it you normally have to cast … Read moreAdding numbers to session variables