01.23.08

screen-scraper version 4.0 released!

Posted in Updates at 10:49 am by Todd Wilson

Well, it’s now official.  It’s been just over a full year in development, and we’re now happy to release it to the world.  Thanks to all who have helped in testing alpha versions and provided feedback.

In order to upgrade an existing instance, you’ll need to un-install and re-install.  Take a look at this FAQ for details as to the whys and wherefores.

01.14.08

Version 4.0 of screen-scraper coming soon…

Posted in Updates at 4:50 pm by Todd Wilson

We’re anticipating releasing version 4.0 of screen-scraper quite soon.  Perhaps as soon as this week.  There will be quite a few changes that come along with this.  Aside from the usual new features and bug fixes, we’ll be adding a new edition–screen-scraper Enterprise Edition.  Essentially what is now the latest pre-release version of screen-scraper Professional Edition will become screen-scraper Enterprise Edition.  The new screen-scraper Professional Edition will simply be the Enterprise Edition with a number of features stripped out.  Additionally, those who license the Enterprise Edition will get phone support, as well as a few other non-tangibles.

Along with all of this there will be a pricing change.  The Professional Edition will be available for $399 USD, and the Enterprise Edition will cost $2,499.  Those who licensed screen-scraper Professional Edition before the release of the Enterprise Edition will be eligible for a free upgrade to it (though they will not get the phone support that subsequent licensees will get).  In the interest of fairness, I thought it would be a good idea to point this out prior to the release of 4.0.  Those considering licensing screen-scraper Professional Edition right now might want to consider it a bit more seriously, given the price increase that will take place with the new version.  As always, don’t hesitate to drop us a line with any questions.

11.20.07

Anonymization now built in to screen-scraper

Posted in Updates at 6:29 pm by Todd Wilson

If you’re currently (or will be at some point) dealing with sites for which you’d like to anonymize the scraping process, I’m happy to announce the availability of a very slick anonymization feature built right in to screen-scraper. If you upgrade to version 3.0.65a (try this link if you have trouble upgrading), you’ll now find a new section in the “Settings” window, and a new “Anonymization” tab for scraping sessions. Once you’ve done the initial setup to use the anonymization service, which is pretty quick, it can be as simple as checking the “Anonymize this scrape” check box. See this page in our docs for all of the details.

We’ve tried several different methods for anonymization, and this is by far the simplest, fastest, and most reliable. Drop us a line if you’re interested in making use of it in your own scrapes.

11.12.07

Handling scraped data in real time

Posted in Updates, Tips at 12:40 pm by Todd Wilson

Once screen-scraper extracts data from a web site, typically that data is sent somewhere else. Data is probably most commonly written out to a file, but may also be saved to a database or even submitted to another web site. You can always handle the scraped data in screen-scraper scripts, but what if you want to make use of the data in your own application, which invokes screen-scraper?

In the past, when invoking screen-scraper from a remote application, the process has generally meant sending screen-scraper the request to scrape, waiting for extraction to occur, then handling that extracted data in the application that invoked screen-scraper. It’s that second step that can be a bit hard to deal with–the request to scrape is sent, but the scraped data can’t be touched by the calling application until screen-scraper finishes its work. This can be especially troublesome in cases where the scrapes are long and might even get interrupted in the middle. This is at best inconvenient, and at worst may mean loss of scraped data.

I recently had a flash of inspiration as to how to deal with these cases, and implemented a new feature in the latest alpha version of screen-scraper (3.0.63a) that greatly facilitates handling data in a remote application as it is getting scraped. First, to give a contrary example, consider the method we advocate in our fourth tutorial for invoking screen-scraper remotely to extract data from our shopping web site. The process goes basically like this:

  1. An external application starts up (e.g., a Java application or PHP script).
  2. The application invokes screen-scraper, telling it to run the “Shopping Site” scraping session.
  3. The “Shopping Site” scraping session runs.
  4. Once the scraping session completes, control returns to the calling application.
  5. The calling application requests the scraped records from screen-scraper.
  6. The scraped records are output by the calling application.

Now consider this possibility:

  1. An external application starts up (e.g., a Java application or PHP script).
  2. The application invokes screen-scraper, telling it to run the “Shopping Site” scraping session.
  3. While the scraping session runs it sends scraped records back to the calling application, which outputs them as they get scraped.

Hopefully the benefits to the second approach are obvious.

Now on to implementation. Consider this Java class (sorry for the odd formatting):

import com.screenscraper.scraper.*;
import com.screenscraper.common.*;

public class PollTest
{
public static void main( String args[] )
{
PollTest test = new PollTest();
test.go();

System.exit( 0 );
}

public void go()
{
try
{
RemoteScrapingSession remoteScrapingSession = new RemoteScrapingSession( “Shopping Site” );
remoteScrapingSession.setVariable(”SEARCH”,”dvd”);
remoteScrapingSession.setVariable( “PAGE”, “1″ );
remoteScrapingSession.setPollFrequency( 1 );
remoteScrapingSession.setDataReceiver( new MyDataReceiver() );
remoteScrapingSession.scrape();
remoteScrapingSession.disconnect();
}
catch( Exception e )
{
System.err.println( “Exception: ” + e.getMessage() );
e.printStackTrace();
}
}

class MyDataReceiver implements DataReceiver
{
public void receiveData( String key, Object value )
{
System.out.println( “Got data from ss.” );
System.out.println( “Key: ” + key );
System.out.println( “Value: ” + value );
}
}
}

The key is the “MyDataReceiver” class, which implements the “DataReceiver” interface. This interface requires the implementation of just one method: receiveData. When the scraping session is configured correctly, this method will get invoked as data is scraped by screen-scraper, allowing you to handle it in your own code. A few other notes on this class:

  • The “setPollFrequency” indicates how often (in seconds) data should be sent from screen-scraper to the client. The default is five seconds.
  • The “setDataReceiver” method must be called before “scrape” is called.

The implementation in screen-scraper is quite simple. I took the standard “Shopping Site” scraping session from the tutorial, and added the following script:

session.sendDataToClient( “DR”, dataRecord );

The script gets invoked after each product is extracted from the web site. The “sendDataToClient” method will accept most any object, including strings, integers, DataRecords, and DataSets.

So far we’ve only implemented this in the Java and PHP drivers, but the others will be forthcoming.

The example source files can be downloaded here, and includes both PHP and Java files. If you decide to give this a try, be sure to upgrade to version 3.0.63a of screen-scraper. You’ll want to reference the latest “screen-scraper.jar” or “misc\php\remote_scraping_session.php” files in your code (found inside the folder where screen-scraper is installed).

07.05.07

Version 3.0.31a of screen-scraper available

Posted in Updates at 11:46 am by Todd Wilson

This one’s definitely a recommended upgrade. It contains a few bug fixes that should remedy some obnoxious behavior you might notice in the previous alpha version.

Aside from bug fixes, this version now allows for sub-extractor patterns to be applied in sequence. It’s not something that’s often needed, but once in a while it can be handy (and even necessary).

Feel free to give it a try and let us know of any more trouble. As always, be sure to back up your work before upgrading to an alpha version.

06.11.07

Version 3.0.28a of screen-scraper available

Posted in Updates at 5:12 pm by Todd Wilson

Aside from a few bug fixes and other niceties since the last announced alpha, this one now handles file uploads in the proxy server. This isn’t a really common need, but it’s been a hole in screen-scraper’s functionality that, happily, is now filled.

Feel free to give it a try and let us know of any more trouble. As always, be sure to back up your work before upgrading to an alpha version.

05.21.07

Version 3.0.22a of screen-scraper available

Posted in Updates at 6:05 pm by Todd Wilson

Thank goodness for alpha versions! That last alpha (3.0.21a) contained somewhat of a nasty bug that would wipe out sub-extractor patterns. Apologies to anyone negatively affected by that (but thanks for helping us test the bleeding edge version!). The upside is that we’ve made a fix for that in version 3.0.22a of screen-scraper. Please give it a try and let us know of any more trouble. As always, be sure to back up your work before upgrading to an alpha version.

On a happier note, version 3.0.22a contains a number of changes that Mac users should love. I recently migrated to a Mac as my primary machine (one of the best decisions I’ve made, by the way), and I’m especially happy about the new changes we’ve made. screen-scraper finally looks, feels and behaves like a Mac app should. This includes menus, look ‘n feel, and keyboard shortcuts. The last little bit we need to take care of is a screen-scraper icon for the dock. Watch for that in a future version.
Speaking of Macs, this is totally unrelated, but I have to share it, only because I’ve been looking for this since I switched. If you haven’t tried it yet, run, do not walk, to binarynights and download ForkLift. This is easily one of the most useful Mac apps I’ve run across.

05.18.07

Version 3.0.21a of screen-scraper available

Posted in Updates at 12:08 pm by Todd Wilson

Well, we’ve been a bit delinquent in announcing new alphas. This most recent one fixes a long-standing, but rarely seen bug where extractor patterns could magically appear where they shouldn’t. There are also quite a few other little niceties and bug fixes since 3.0.14a, when we last announced an update.

For those interested, feel free to upgrade in the usual way. As always, this is alpha code, so please back up your work before upgrading.

04.03.07

Version 3.0.14a of screen-scraper available

Posted in Updates at 4:17 pm by Todd Wilson

Yeah, it’s been a while since we announced an alpha version. This most recent one contains some nice fixes and a few new goodies that you should like. We’re using it internally, and it’s shown to be quite stable. The usual caveats apply before upgrading, though (back up your work!).

We’ve also been working on a secret little side project. I won’t tell you exactly what it is, but here’s a hint–start screen-scraper running in server mode. Assuming you haven’t changed the default SOAP port, pop this URL into your favorite browser:

http://localhost:8779/web.htm

03.27.07

Ruby Tuesday

Posted in Updates at 10:30 am by Todd Wilson

We’ve had requests for this in the past, and I’m happy to report that we’re now able to deliver. You can now invoke screen-scraper from a Ruby script, via a driver we’ve just released. This was cooked up by a former employee of ours (thanks Adam!), who needed the functionality, and graciously donated his code. We’re still working a bit on documentation and example code, but if you already know some Ruby it should be pretty straightforward to use. To get started visit our Invoking screen-scraper from Ruby page.

« Previous entries ·