screen-scrapeable - Page 2 of 17 - Thoughts, tips, and updates on screen-scraping

HTTPS connection issues

October 13, 2015April 29, 2015 by jason

We’ve been seeing lots of issues with scrapes connecting to HTTPS sites. Some of the errors include

ssl_error_rx_record_too_long
An input/output error occurred while connecting to https:// … The message was peer not authenticated.
javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated

The issue came about when the Heartbleed vulnerability necessitated changes to some HTTPS connections—some of types aren’t secure anymore, and new versions have come out. Screen-scraper needed two changes to catch up, and they are:

Update to use Java 8
Update of HTTPClient to 4.4

Both of these are pretty large changes, so they aren’t in the stable release yet, however in some cases they are the only option to make a scrape work, therefore here is the instructions to get what you need.

Scraping data from various industries

June 10, 2013 by Todd Wilson

We’ve just added several new scraping sessions that exemplify extracting data from sites in various industries. If you go to our home page and click on one of the buttons corresponding to an industry you’ll be taken to a page where you can download the scraping session. The e-commerce section also has a video to … Read moreScraping data from various industries

Apache Commons

May 28, 2013 by jason

We’ve recently included libraries for Apache Commons Lang. There is a large number of useful things in there, but I find most use for stringUtils and wordUtils. For example, some sites one might scrape might have the results in all caps. You could: import org.apache.commons.lang.*; name = “GEORGE WASHINGTON CARVER”; name = StringUtils.lowerCase(name); name = … Read moreApache Commons

End-of-year sale!

November 29, 2012 by Todd Wilson

This is our biggest sale in quite a while. Until December 31, 2012 take 40% off Professional Edition licenses and 60% off Enterprise Edition licenses. Click here to take advantage.

Version 6.0.18a of screen-scraper Released

October 16, 2012 by Todd Wilson

A few minor updates in the one, along with a long-awaited global find feature!

Let Us Help You Learn screen-scraper

July 19, 2012 by Todd Wilson

We are pleased to announce our new coaching program. To help get started, our new users can receive up to two free hours of one-on-one coaching (click here for details). Existing users, receive help planning out your project, solving that one tough issue, learn new techniques and refine your current scraping projects. Purchase hours of training … Read moreLet Us Help You Learn screen-scraper

Version 6.0.14a of screen-scraper Released

June 28, 2012 by Todd Wilson

Several small changes in this one: Extractor patterns invoked manually can now be tested on a sub-set of the HTML page. Added scrapeableFile.setForcePOST. Upgraded internal GWT libraries. Prettied up the web UI. Check the alpha log for a full list of changes.

New Quick Guide video

June 16, 2012June 15, 2012 by Todd Wilson

We recently released a new Quick Guide video. In less than three minutes you can get an idea of what it’s like to use screen-scraper. Source

Version 6.0.6a of screen-scraper Released

May 10, 2012 by Todd Wilson

Several small changes in this one: Upgraded Bean Shell to the latest version. Searches within a proxy session now include notes. Fixed an issue that would cause the workbench to freeze when the breakpiont window was up. Now using global proxy settings if no session proxy settings are found. Improved cookie handling in the proxy … Read moreVersion 6.0.6a of screen-scraper Released

Version 6.0.4a of screen-scraper Released

May 1, 2012 by Todd Wilson

Several bug fixes and new features in this release: Fixed a bug such that a scrapeable session ID is now being generated even for scraping sessions that will run in the future. Fixed a bug where nodes in the tree weren’t being highlighted correctly. Scrapeable files can now be added via a URL. If the … Read moreVersion 6.0.4a of screen-scraper Released