11.11.09

Screen-Scraper Annotations for Java

Posted in Uncategorized at 1:22 pm by Todd Wilson

One of the primary design goals of screen-scraper from the very beginning has been to emphasize extensibility.  We’ve tried to build in a number of features and tools to make screen-scraping easier, but we also realize that we can’t fit it all in.  Features such as the internal scripting engine and the ability to invoke screen-scraper from external applications allow it to be extended according to the whims of the developer.

Recently astute scraper Rodney Aiglstorfer came up with an excellent way to link data extracted within screen-scraper to custom-built classes.  He’s dubbed it “Screen-Scraper Annotations for Java”, and you can find it here: http://code.google.com/p/ssa4j/.  Rodney’s been good enough to release the library under an open source license, so others can benefit as well.

del.icio.us:Screen-Scraper Annotations for Java digg:Screen-Scraper Annotations for Java spurl:Screen-Scraper Annotations for Java wists:Screen-Scraper Annotations for Java simpy:Screen-Scraper Annotations for Java newsvine:Screen-Scraper Annotations for Java blinklist:Screen-Scraper Annotations for Java furl:Screen-Scraper Annotations for Java reddit:Screen-Scraper Annotations for Java fark:Screen-Scraper Annotations for Java blogmarks:Screen-Scraper Annotations for Java Y!:Screen-Scraper Annotations for Java smarking:Screen-Scraper Annotations for Java magnolia:Screen-Scraper Annotations for Java segnalo:Screen-Scraper Annotations for Java

2 Comments »

  1. jon rios said,

    December 17, 2009 at 2:15 pm

    like your dedictation to data extraction on the web, have used several products in the past and just found yours.. I am evaluating it for last 2 days…

    i wanted to bring to your attention this company that sells SW that does not allow “bots” to scrap pages, http://www.pramana.com

    i assume you aer aware of them or similiar companies…. just wanted to get your expert advice on their technology and what the potential impact on scrapping, if any, or even on your own product (i would assume none since it seems you work the extraction through a saved proxy session and dont scrap directly from the site..at least this is my initial understanding of your functionality over others, please conrrect me if i am wrong)

    thks and nice work

  2. Todd Wilson said,

    December 17, 2009 at 5:06 pm

    Hi,

    Seems like we ran across an announcement on this one earlier, but I haven’t yet seen it out in the wild. It’s quite rare that we encounter a site that we can’t scrape, though. Not knowing exactly how this utility works I can’t say for certain just how we’d handle it, but at some point we may need to try :)

    Todd

Leave a Comment