We’ve just added several new scraping sessions that exemplify extracting data from sites in various industries. If you go to our home page and click on one of the buttons corresponding to an industry you’ll be taken to a page where you can download the scraping session. The e-commerce section also has a video to walk you through the process, and we’ll be adding videos to the others shortly.
We’ve recently included libraries for Apache Commons Lang. There is a large number of useful things in there, but I find most use for stringUtils and wordUtils.
For example, some sites one might scrape might have the results in all caps. You could:
name = “GEORGE WASHINGTON CARVER”;
name = StringUtils.lowerCase(name);
name = WordUtils.capitalize(name);
session.log(“Name now shows as: ” + name);
At the end, the name is now formatted as “George Washington Carver”. Most all of the methods are already nullsafe, and there is a lot of little tools in there to try.
This is our biggest sale in quite a while. Until December 31, 2012 take 40% off Professional Edition licenses and 60% off Enterprise Edition licenses. Click here to take advantage.
A few minor updates in the one, along with a long-awaited global find feature!
We are pleased to announce our new coaching program. To help get started, our new users can receive up to two free hours of one-on-one coaching (click here for details).
Existing users, receive help planning out your project, solving that one tough issue, learn new techniques and refine your current scraping projects. Purchase hours of training by calling our offices at 800-672-0113.
Several small changes in this one:
- Extractor patterns invoked manually can now be tested on a sub-set of the HTML page.
- Added scrapeableFile.setForcePOST.
- Upgraded internal GWT libraries.
- Prettied up the web UI.
Check the alpha log for a full list of changes.
We recently released a new Quick Guide video. In less than three minutes you can get an idea of what it’s like to use screen-scraper.
Several small changes in this one:
- Upgraded Bean Shell to the latest version.
- Searches within a proxy session now include notes.
- Fixed an issue that would cause the workbench to freeze when the breakpiont window was up.
- Now using global proxy settings if no session proxy settings are found.
- Improved cookie handling in the proxy server.
- Fixed a bug that would cause a proxy session to not be completely saved.
- Added sutil.makeGETRequestNoSessionProxy.
Several bug fixes and new features in this release:
- Fixed a bug such that a scrapeable session ID is now being generated even for scraping sessions that will run in the future.
- Fixed a bug where nodes in the tree weren’t being highlighted correctly.
- Scrapeable files can now be added via a URL.
- If the DatabasePort and WebServerShutdownPort properties are omitted from the screen-scraper.properties file they’ll now be automatically set to the value of an open port.
- The ProxyPort will now only be tested and used when screen-scraper is running in server mode if the AllowProxyScripting is set to true.
- Added a “Load Response from Clipboard” button to the scrapeable file panel.
- Updated BeanShell to the latest version, disabling unstable Windows scripting in the process (e.g., VBScript).
It’s now official! Precisely one year after our last major release (5.5) we’ve let loose version 6.0. This is undoubtedly our most feature-rich rock-solid version yet. Take a glance through the release notes for a list of the myriad of changes. Better yet, either upgrade your existing instance, or download it fresh!