09.02.09
Posted in Updates at 5:38 pm by Todd Wilson
We’re constantly updating screen-scraper with bug fixes and new features, but haven’t always been good about documenting changes. These newer features are typically only available in our alpha versions. Whereas previously you were on your own to figure out what was new, we’re now going to do our best to document new features here:
Alpha documentation
These docs might not be quite as neat and clean as the others, but if you’re using our alpha versions and want to see what’s new, this is a good page to watch.
Permalink
08.28.09
Posted in Updates at 5:27 pm by Todd Wilson
Actually, I should probably call it a REST-like API. I have no doubt the purists will point out that it isn’t a REST API at all. How about we’ll call it an “API accessible via GET requests”.
With that loquacious introduction, I’m happy to announce that, as of version 4.5.18a, you can access screen-scraper via GET requests. Let me just state right here and now that this is alpha functionality and may very well change before the next public release. Use it at your own risk. As with any of our alpha features the documentation is scant, so I’ll simply provide a long list of examples as to how you might use it. Hopefully you’ll get the idea.
You’ll first need to start up screen-scraper in server mode. Once that’s done you can then access a slew of features you’d normally only be able to access via the web interface. Here they are:
http://localhost:8779/ss/rest?action=get_runnable_scraping_sessions
http://localhost:8779/ss/rest?action=get_scrapeable_sessions
http://localhost:8779/ss/rest?action=run_scraping_session&scraping_session_name=Shopping+Site
http://localhost:8779/ss/rest?action=stop_running_scraping_session&scrapeable_session_id=43
http://localhost:8779/ss/rest?action=stop_all_running_scraping_session
http://localhost:8779/ss/rest?action=remove_scrapeable_session&scrapeable_session_id=29
http://localhost:8779/ss/rest?action=reload_settings
http://localhost:8779/ss/rest?action=peek_scrapeable_session_log&scrapeable_session_id=42&num_lines=50
http://localhost:8779/ss/rest?action=get_scheduled_scraping_sessions
http://localhost:8779/ss/rest?action=disable_enable_scheduled_scraping_session&scheduled_scraping_session_id=110&enable=false
http://localhost:8779/ss/rest?action=remove_scheduled_scraping_session&scheduled_scraping_session_id=0
http://localhost:8779/ss/rest?action=set_scheduled_scraping_session&scheduled_scraping_session_id=3&scraping_session_name=Shopping+Site&timeout=123&schedule_date=08%2F20%2F2009&schedule_time=11:22:33&repeat_days=4&repeat_hours=3&repeat_minutes=2&repeat_seconds=1&threshold_time=21&threshold_record_count=43&settable_session_variables=this%3Dthatx%26foo%3Dbar
http://localhost:8779/ss/rest?action=save_settings&default_timeout=89&default_repeat_days=9&default_repeat_hours=8&default_repeat_minutes=7&default_repeat_seconds=6&default_threshold_time=4&default_threshold_record_count=3
http://localhost:8779/ss/rest?action=set_session_variable_on_scrapeable_session&scrapeable_session_id=3&key=foo&value=bap
http://localhost:8779/ss/rest?action=get_session_variable_from_scrapeable_session&scrapeable_session_id=3&key=foo
http://localhost:8779/ss/rest?action=get_memory_usage
As with any alpha feature we appreciate bug reports and feedback. Please don’t hesitate to drop us a line.
Permalink
03.09.09
Posted in Updates at 3:53 pm by Todd Wilson
Well, we finally did it. Break out the party hats and the sparkling apple juice (yes, we live in Utah).
We invite everyone and anyone to download or update to version 4.5 of screen-scraper. It is by far the most feature-rich and stable version to date. If you’re interested in checking out what’s new, take a look at the release notes.
Also, for anyone listening, keep an eye on the site if you’re considering purchasing in the near future. We’re about to do a little sale to celebrate the release of the new version…
Permalink
02.02.09
Posted in Updates at 12:00 pm by Todd Wilson
Here at screen-scraper we’re on pins and needles as we’re about to release another public version of screen-scraper (we’re anticipating calling it 4.5). Our current alpha release is looking to be pretty solid, and we’re planning on giving it just a bit more testing to ensure that there aren’t any bugs left. If you’re interested in helping us test, feel free to upgrade to the latest alpha version. Here’s a FAQ that might help on that: http://community.screen-scraper.com/faq#80n867.
Permalink
01.23.08
Posted in Updates at 10:49 am by Todd Wilson
Well, it’s now official. It’s been just over a full year in development, and we’re now happy to release it to the world. Thanks to all who have helped in testing alpha versions and provided feedback.
In order to upgrade an existing instance, you’ll need to un-install and re-install. Take a look at this FAQ for details as to the whys and wherefores.
Permalink
01.14.08
Posted in Updates at 4:50 pm by Todd Wilson
We’re anticipating releasing version 4.0 of screen-scraper quite soon. Perhaps as soon as this week. There will be quite a few changes that come along with this. Aside from the usual new features and bug fixes, we’ll be adding a new edition–screen-scraper Enterprise Edition. Essentially what is now the latest pre-release version of screen-scraper Professional Edition will become screen-scraper Enterprise Edition. The new screen-scraper Professional Edition will simply be the Enterprise Edition with a number of features stripped out. Additionally, those who license the Enterprise Edition will get phone support, as well as a few other non-tangibles.
Along with all of this there will be a pricing change. The Professional Edition will be available for $399 USD, and the Enterprise Edition will cost $2,499. Those who licensed screen-scraper Professional Edition before the release of the Enterprise Edition will be eligible for a free upgrade to it (though they will not get the phone support that subsequent licensees will get). In the interest of fairness, I thought it would be a good idea to point this out prior to the release of 4.0. Those considering licensing screen-scraper Professional Edition right now might want to consider it a bit more seriously, given the price increase that will take place with the new version. As always, don’t hesitate to drop us a line with any questions.
Permalink
11.20.07
Posted in Updates at 6:29 pm by Todd Wilson
If you’re currently (or will be at some point) dealing with sites for which you’d like to anonymize the scraping process, I’m happy to announce the availability of a very slick anonymization feature built right in to screen-scraper. If you upgrade to version 3.0.65a (try this link if you have trouble upgrading), you’ll now find a new section in the “Settings” window, and a new “Anonymization” tab for scraping sessions. Once you’ve done the initial setup to use the anonymization service, which is pretty quick, it can be as simple as checking the “Anonymize this scrape” check box. See this page in our docs for all of the details.
We’ve tried several different methods for anonymization, and this is by far the simplest, fastest, and most reliable. Drop us a line if you’re interested in making use of it in your own scrapes.
Permalink
11.12.07
Posted in Tips, Updates at 12:40 pm by Todd Wilson
Once screen-scraper extracts data from a web site, typically that data is sent somewhere else. Data is probably most commonly written out to a file, but may also be saved to a database or even submitted to another web site. You can always handle the scraped data in screen-scraper scripts, but what if you want to make use of the data in your own application, which invokes screen-scraper?
In the past, when invoking screen-scraper from a remote application, the process has generally meant sending screen-scraper the request to scrape, waiting for extraction to occur, then handling that extracted data in the application that invoked screen-scraper. It’s that second step that can be a bit hard to deal with–the request to scrape is sent, but the scraped data can’t be touched by the calling application until screen-scraper finishes its work. This can be especially troublesome in cases where the scrapes are long and might even get interrupted in the middle. This is at best inconvenient, and at worst may mean loss of scraped data.
I recently had a flash of inspiration as to how to deal with these cases, and implemented a new feature in the latest alpha version of screen-scraper (3.0.63a) that greatly facilitates handling data in a remote application as it is getting scraped. First, to give a contrary example, consider the method we advocate in our fourth tutorial for invoking screen-scraper remotely to extract data from our shopping web site. The process goes basically like this:
- An external application starts up (e.g., a Java application or PHP script).
- The application invokes screen-scraper, telling it to run the “Shopping Site” scraping session.
- The “Shopping Site” scraping session runs.
- Once the scraping session completes, control returns to the calling application.
- The calling application requests the scraped records from screen-scraper.
- The scraped records are output by the calling application.
Now consider this possibility:
- An external application starts up (e.g., a Java application or PHP script).
- The application invokes screen-scraper, telling it to run the “Shopping Site” scraping session.
- While the scraping session runs it sends scraped records back to the calling application, which outputs them as they get scraped.
Hopefully the benefits to the second approach are obvious.
Now on to implementation. Consider this Java class (sorry for the odd formatting):
import com.screenscraper.scraper.*;
import com.screenscraper.common.*;
public class PollTest
{
public static void main( String args[] )
{
PollTest test = new PollTest();
test.go();
System.exit( 0 );
}
public void go()
{
try
{
RemoteScrapingSession remoteScrapingSession = new RemoteScrapingSession( “Shopping Site” );
remoteScrapingSession.setVariable(”SEARCH”,”dvd”);
remoteScrapingSession.setVariable( “PAGE”, “1″ );
remoteScrapingSession.setPollFrequency( 1 );
remoteScrapingSession.setDataReceiver( new MyDataReceiver() );
remoteScrapingSession.scrape();
remoteScrapingSession.disconnect();
}
catch( Exception e )
{
System.err.println( “Exception: ” + e.getMessage() );
e.printStackTrace();
}
}
class MyDataReceiver implements DataReceiver
{
public void receiveData( String key, Object value )
{
System.out.println( “Got data from ss.” );
System.out.println( “Key: ” + key );
System.out.println( “Value: ” + value );
}
}
}
The key is the “MyDataReceiver” class, which implements the “DataReceiver” interface. This interface requires the implementation of just one method: receiveData. When the scraping session is configured correctly, this method will get invoked as data is scraped by screen-scraper, allowing you to handle it in your own code. A few other notes on this class:
- The “setPollFrequency” indicates how often (in seconds) data should be sent from screen-scraper to the client. The default is five seconds.
- The “setDataReceiver” method must be called before “scrape” is called.
The implementation in screen-scraper is quite simple. I took the standard “Shopping Site” scraping session from the tutorial, and added the following script:
session.sendDataToClient( “DR”, dataRecord );
The script gets invoked after each product is extracted from the web site. The “sendDataToClient” method will accept most any object, including strings, integers, DataRecords, and DataSets.
So far we’ve only implemented this in the Java and PHP drivers, but the others will be forthcoming.
The example source files can be downloaded here, and includes both PHP and Java files. If you decide to give this a try, be sure to upgrade to version 3.0.63a of screen-scraper. You’ll want to reference the latest “screen-scraper.jar” or “misc\php\remote_scraping_session.php” files in your code (found inside the folder where screen-scraper is installed).
Permalink
07.05.07
Posted in Updates at 11:46 am by Todd Wilson
This one’s definitely a recommended upgrade. It contains a few bug fixes that should remedy some obnoxious behavior you might notice in the previous alpha version.
Aside from bug fixes, this version now allows for sub-extractor patterns to be applied in sequence. It’s not something that’s often needed, but once in a while it can be handy (and even necessary).
Feel free to give it a try and let us know of any more trouble. As always, be sure to back up your work before upgrading to an alpha version.
Permalink
06.11.07
Posted in Updates at 5:12 pm by Todd Wilson
Aside from a few bug fixes and other niceties since the last announced alpha, this one now handles file uploads in the proxy server. This isn’t a really common need, but it’s been a hole in screen-scraper’s functionality that, happily, is now filled.
Feel free to give it a try and let us know of any more trouble. As always, be sure to back up your work before upgrading to an alpha version.
Permalink
« Previous entries