07.31.06

Extracting data from Java applets, ActiveX controls, and Adobe Flash movies

Posted in Tips at 10:04 am by Todd Wilson

This is a question we get from time to time, so I finally decided to add it to our FAQ. If anyone else has experience with this kind of thing feel free to post a comment. I’m unaware of many packages that can do this.

Here’s the posting from the FAQ:

The short answer to this one is, “Sometimes.” Most all widgets (applets, etc.) that communicate with their server via HTTP can be sccraped by screen-scraper. Oftentimes, however, they’ll use a proprietary protocol. Most of the time Adobe Flash movies use HTTP when they need to communicate with a server, but Java applets and ActiveX controls don’t always. The easiest way to find out is to use screen-scraper’s proxy server when interacting with a page containing one of these elements. Take a close look at the HTTP requests and responses passing between the web browser and the server. If you see text in there (often XML or URL-encoded lists of parameters) then the chances are good that screen-scraper can extract the information being passed between the client and server. Note, however, that there may be text that the widget is displaying that doesn’t get passed between the client and server. Unfortunately, in such cases, screen-scraper is unable to extract that information. The only utility we’re aware of that may allow for scraping that type of information would be IBM’s Rational Robot software.

07.20.06

Version 2.7.2.9a of screen-scraper available

Posted in Updates at 4:31 pm by Todd Wilson

Pfew. Well, sorry it’s been so long. We’ve been swamped with work lately, but I’m happy to report that we’ve recently carved out enough time to get a new alpha version of screen-scraper out the door. For the impatient, here are some quick install instructions (which differ from the usual):

  1. Back up your scraping sessions (check here for help on that).
  2. Ensure screen-scraper isn’t currently running (close the workbench and server, if running).
  3. Download this file, and unzip it.
  4. Copy the contents of the zip file on top of your existing files in the screen-scraper install folder. For example, the zip file contains a “screen-scraper.jar” file which should be copied on top of your existing “screen-scraper.jar” file.
  5. Edit your “resource\conf\screen-scraper.properties” file in a text editor. Change the “Version” property to “2.7.2.9a”.
  6. Launch the screen-scraper workbench.
  7. If all of your scraping sessions have disappeared, don’t panic!
  8. Close the screen-scraper workbench.
  9. Re-open the screen-scraper workbench.

At this point everything should be hunky-dory and you should see a few new features (such as folders).

As to why you need to update screen-scraper in this odd way, we discovered a bug in the updater. It’s surprising that it’s never surfaced before, but hopefully we’ve permanently squashed it so that you can easily update via the standard “Check for updates” menu item in the future.

As always, feel free to send along any feedback. You can post a comment to this blog posting, a message to our forum, or send us a support request.