09.21.10

screen-scraper Helps Power Oracle OpenWorld Search

Posted in Miscellaneous at 10:23 pm by Todd Wilson

Not to toot our own horn (okay, we will), but our very own screen-scraper software is helping to power the search feature for the currently-running Oracle OpenWorld conference.  From the OpenWorld home page, try a search in the box found in the upper-right corner (try something like “SES”).  The search results you see where scraped from their content catalog, keynotes, and blog postings, then aggregated and enriched with information like spatial data (e.g., for demos you can click the location to see on a map exactly where it occurs).  The excellent search interface is provided by Oracle Secure Enterprise Search, with which screen-scraper has been integrated.

This is actually a great example of the power of screen-scraping.  Take information from various web sources, dump them all into a single database, then correlate and enrich the information in a searchable interface.  It’s a powerful thing to take disparate pieces and sum them into something that’s much greater than the individual parts.

09.13.10

Big update

Posted in Updates at 4:48 pm by Todd Wilson

We’ve just released version 5.0.13a of screen-scraper, which has an updated version of a library that the software relies on heavily.  We’ve moved from version 3 of HttpClient to version 4.  Between the two versions the API changed completely, so this necessitated a pretty big change in the internals of screen-scraper.  We’ve done a fair amount of testing of this version internally within our company, but we’re still just a bit nervous pushing it out to the public, even as an alpha version.  As such, if you’re kind enough to help us test the alpha versions, please let us know as soon as possible if you find a flaw in this particular version.  All of the major functionality should be working fine, but there could be some fringe cases (e.g., connecting via an NTLM web proxy) that haven’t been tested as thoroughly.