07.14.10

Update recommended for 5.0 users

Posted in Updates at 10:50 am by Todd Wilson

Well, we tested and tested, but it looks like a little bug slipped by us in 5.0 that’s probably worth an upgrade.  We’ve just released 5.0.1a that contains a fix for it, and we recommend that 5.0 users upgrade.  To do that you’ll need to allow upgrading to unstable versions.  Here’s a FAQ on how to go about that and upgrade.  Not to worry about the stability of 5.0.1a–it contains this bug fix as well as a few other minor ones.  It should be completely safe to upgrade.  Anyone who installs 5.0 fresh will get the bug fix.

For those interested, the bug fix addresses an issue where line numbers reported in error messages will be offset by two lines.  That is, if your script has an error on line 100, the error message will report line 102 instead.  The bug fix simply corrects this so that the correct line number of the error is reported.

07.06.10

Version 5.0 of screen-scraper released! And it’s on sale!!!

Posted in Miscellaneous, Updates at 2:54 pm by Todd Wilson

Okay, that was probably too many exclamation points in the title.  It’s with good reason, though.  Version 5.0 represents a major upgrade in screen-scraper’s functionality (take a glance at the release notes to see what I mean).  Not only have we made all kinds of bug fixes, but there are lots of enhancements to the user interface as well as completely new functionality.

Along with this release we’ve also done some significant revamping of our docs, which we’ll continue to do.  We want to make sure screen-scraper is easier to learn for the beginner and quicker to use for the seasoned veteran.

And on top of all of that it’s on sale!  Until August 15 of this year you can get the Enterprise Edition for $800 off and the Professional Edition for $150 off.  We’ll be raising prices back up after that date, so get it while it’s hot.  Interested in purchasing?  Click here to be taken to our registration page.

04.15.10

To Iterate is Human, to Recurse, Divine

Posted in Tips, Updates at 11:08 am by Todd Wilson

Well, that’s actually not always true.  Take a quick look at this blog posting here.  The fundamental issue described by that posting is one of recursion vs. iteration.  When recursion is used (a page calls a page which calls a page…) objects tend to get stacked up, and subsequently fill up memory.  When iteration is used objects are properly cleaned up so memory doesn’t become a problem.  The trouble is, this condition is often hard to detect, and unless you’re thinking about it when you’re building your scraping session, you may cause it without realizing it.

An astute screen-scraper user yesterday suggested a solution to this that is both simple and effective.  In the case described in the blog posting you end up with a big stack of scripts, all of which have references to objects, which causes the OutOfMemoryError.  The number of scripts on the stack can be viewed in the breakpoint window, and in version 4.5.45a we added a method that will allow you to see how many scripts are on the stack from within a script:

session.getNumScriptsOnStack()

You can check this number as often as you’d like.  As it grows it could mean trouble, so you can respond appropriately in your scraping session.  We’ve also added a failsafe mechanism inside of screen-scraper that will hopefully save you from an OutOfMemoryError.  If too many scripts are pushed on the stack your scraping session will be stopped and the following message will be output to the log:

ERROR–halting the scraping session because the maximum number of scripts allowed on the stack was reached.

You can control the maximum number of scripts allowed on the stack by invoking this method at any time:

session.setMaxScriptsOnStack( 50 )

Set that number to whatever you’d like.

By design screen-scraper provides a lot of flexibility and power in the data extraction process, but this same power can also result in our shooting ourselves in the foot on occasion.  The inclusion of this new mechanism will hopefully help some to avoid this problem down the road.

04.09.10

Tidy Time

Posted in Updates at 6:04 pm by Todd Wilson

So lately we’ve been experimenting with different tidiers in the latest alpha versions of screen-scraper.  This is the little utility that will clean up malformed HTML, making extraction easier.  For some time we’ve used a library called JTidy to handle this, which has worked quite well, but does have a couple of problems.  First, at times it simply fails to tidy the HTML.  If you’ve been using screen-scraper for a while you’ve likely seen a message indicating this in the log.  This isn’t too big of a deal, but can be a bit of a hassle.  Second, in very rare instances we’ve actually found that it will omit portions of an HTML page which are especially malformed.  This is definitely a problem and can make debugging difficult.

In order to address the issues above we’ve been trying out a few other tidiers–NekoHTML and Jericho.  We’ve actually already found issues with NekoHTML, so Jericho looks to be the favorite as of right now.  Both will still require some experimentation, though, so please use them at your own risk for now.  Once we’ve put them both through the paces we’ll likely settle on one as the recommended default.  And not to worry about any scrapeable files that are already using JTidy–they’ll stay just as they are.  At some point, though, for any new scrapeable files, you might notice a different tidier as the default.

09.02.09

Alpha documentation

Posted in Updates at 5:38 pm by Todd Wilson

We’re constantly updating screen-scraper with bug fixes and new features, but haven’t always been good about documenting changes.  These newer features are typically only available in our alpha versions.  Whereas previously you were on your own to figure out what was new, we’re now going to do our best to document new features here:

Alpha documentation

These docs might not be quite as neat and clean as the others, but if you’re using our alpha versions and want to see what’s new, this is a good page to watch.

08.28.09

REST API

Posted in Updates at 5:27 pm by Todd Wilson

Actually, I should probably call it a REST-like API.  I have no doubt the purists will point out that it isn’t a REST API at all.  How about we’ll call it an “API accessible via GET requests”.

With that loquacious introduction, I’m happy to announce that, as of version 4.5.18a, you can access screen-scraper via GET requests.  Let me just state right here and now that this is alpha functionality and may very well change before the next public release.  Use it at your own risk.  As with any of our alpha features the documentation is scant, so I’ll simply provide a long list of examples as to how you might use it.  Hopefully you’ll get the idea.

You’ll first need to start up screen-scraper in server mode.  Once that’s done you can then access a slew of features you’d normally only be able to access via the web interface.  Here they are:


http://localhost:8779/ss/rest?action=get_runnable_scraping_sessions
http://localhost:8779/ss/rest?action=get_scrapeable_sessions
http://localhost:8779/ss/rest?action=run_scraping_session&scraping_session_name=Shopping+Site
http://localhost:8779/ss/rest?action=stop_running_scraping_session&scrapeable_session_id=43
http://localhost:8779/ss/rest?action=stop_all_running_scraping_session
http://localhost:8779/ss/rest?action=remove_scrapeable_session&scrapeable_session_id=29
http://localhost:8779/ss/rest?action=reload_settings
http://localhost:8779/ss/rest?action=peek_scrapeable_session_log&scrapeable_session_id=42&num_lines=50
http://localhost:8779/ss/rest?action=get_scheduled_scraping_sessions
http://localhost:8779/ss/rest?action=disable_enable_scheduled_scraping_session&scheduled_scraping_session_id=110&enable=false
http://localhost:8779/ss/rest?action=remove_scheduled_scraping_session&scheduled_scraping_session_id=0
http://localhost:8779/ss/rest?action=set_scheduled_scraping_session&scheduled_scraping_session_id=3&scraping_session_name=Shopping+Site&timeout=123&schedule_date=08%2F20%2F2009&schedule_time=11:22:33&repeat_days=4&repeat_hours=3&repeat_minutes=2&repeat_seconds=1&threshold_time=21&threshold_record_count=43&settable_session_variables=this%3Dthatx%26foo%3Dbar
http://localhost:8779/ss/rest?action=save_settings&default_timeout=89&default_repeat_days=9&default_repeat_hours=8&default_repeat_minutes=7&default_repeat_seconds=6&default_threshold_time=4&default_threshold_record_count=3
http://localhost:8779/ss/rest?action=set_session_variable_on_scrapeable_session&scrapeable_session_id=3&key=foo&value=bap
http://localhost:8779/ss/rest?action=get_session_variable_from_scrapeable_session&scrapeable_session_id=3&key=foo
http://localhost:8779/ss/rest?action=get_memory_usage

As with any alpha feature we appreciate bug reports and feedback.  Please don’t hesitate to drop us a line.

03.09.09

Version 4.5 released!

Posted in Updates at 3:53 pm by Todd Wilson

Well, we finally did it.  Break out the party hats and the sparkling apple juice (yes, we live in Utah).

We invite everyone and anyone to download or update to version 4.5 of screen-scraper.  It is by far the most feature-rich and stable version to date.  If you’re interested in checking out what’s new, take a look at the release notes.

Also, for anyone listening, keep an eye on the site if you’re considering purchasing in the near future.  We’re about to do a little sale to celebrate the release of the new version…

02.02.09

On the Cusp of a Public Release

Posted in Updates at 12:00 pm by Todd Wilson

Here at screen-scraper we’re on pins and needles as we’re about to release another public version of screen-scraper (we’re anticipating calling it 4.5).  Our current alpha release is looking to be pretty solid, and we’re planning on giving it just a bit more testing to ensure that there aren’t any bugs left.  If you’re interested in helping us test, feel free to upgrade to the latest alpha version.  Here’s a FAQ that might help on that: http://community.screen-scraper.com/faq#80n867.

01.23.08

screen-scraper version 4.0 released!

Posted in Updates at 10:49 am by Todd Wilson

Well, it’s now official.  It’s been just over a full year in development, and we’re now happy to release it to the world.  Thanks to all who have helped in testing alpha versions and provided feedback.

In order to upgrade an existing instance, you’ll need to un-install and re-install.  Take a look at this FAQ for details as to the whys and wherefores.

01.14.08

Version 4.0 of screen-scraper coming soon…

Posted in Updates at 4:50 pm by Todd Wilson

We’re anticipating releasing version 4.0 of screen-scraper quite soon.  Perhaps as soon as this week.  There will be quite a few changes that come along with this.  Aside from the usual new features and bug fixes, we’ll be adding a new edition–screen-scraper Enterprise Edition.  Essentially what is now the latest pre-release version of screen-scraper Professional Edition will become screen-scraper Enterprise Edition.  The new screen-scraper Professional Edition will simply be the Enterprise Edition with a number of features stripped out.  Additionally, those who license the Enterprise Edition will get phone support, as well as a few other non-tangibles.

Along with all of this there will be a pricing change.  The Professional Edition will be available for $399 USD, and the Enterprise Edition will cost $2,499.  Those who licensed screen-scraper Professional Edition before the release of the Enterprise Edition will be eligible for a free upgrade to it (though they will not get the phone support that subsequent licensees will get).  In the interest of fairness, I thought it would be a good idea to point this out prior to the release of 4.0.  Those considering licensing screen-scraper Professional Edition right now might want to consider it a bit more seriously, given the price increase that will take place with the new version.  As always, don’t hesitate to drop us a line with any questions.

« Previous entries