HTTPS connection issues

We’ve been seeing lots of issues with scrapes connecting to HTTPS sites. Some of the errors include

  • ssl_error_rx_record_too_long
  • An input/output error occurred while connecting to https:// … The message was peer not authenticated.
  • javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated

The issue came about when the Heartbleed vulnerability necessitated changes to some HTTPS connections—some of types aren’t secure anymore, and new versions have come out. Screen-scraper needed two changes to catch up, and they are:

  • Update to use Java 8
  • Update of HTTPClient to 4.4

Both of these are pretty large changes, so they aren’t in the stable release yet, however in some cases they are the only option to make a scrape work, therefore here is the instructions to get what you need.

Read moreHTTPS connection issues

Apache Commons

We’ve recently included libraries for Apache Commons Lang. There is a large number of useful things in there, but I find most use for stringUtils and wordUtils. For example, some sites one might scrape might have the results in all caps. You could: import org.apache.commons.lang.*; name = “GEORGE WASHINGTON CARVER”; name = StringUtils.lowerCase(name); name = … Read moreApache Commons

Let Us Help You Learn screen-scraper

We are pleased to announce our new coaching program. To help get started, our new users can receive up to two free hours of one-on-one coaching (click here for details). Existing users, receive help planning out your project, solving that one tough issue, learn new techniques and refine your current scraping projects. Purchase hours of training … Read moreLet Us Help You Learn screen-scraper