One’s first experience with a page full of dynamic content can be pretty confusing. Generally one can request the HTML, but it’s missing the data that is sought.
We’ve been seeing lots of issues with scrapes connecting to HTTPS sites. Some of the errors include
An input/output error occurred while connecting to https:// … The message was peer not authenticated.
javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
The issue came about when the Heartbleed vulnerability necessitated changes to some HTTPS connections—some of types aren’t secure anymore, and new versions have come out. Screen-scraper needed two changes to catch up, and they are:
Update to use Java 8
Update of HTTPClient to 4.4
Both of these are pretty large changes, so they aren’t in the stable release yet, however in some cases they are the only option to make a scrape work, therefore here is the instructions to get what you need. Read the rest of this entry »
We’ve just added several new scraping sessions that exemplify extracting data from sites in various industries. If you go to our home page and click on one of the buttons corresponding to an industry you’ll be taken to a page where you can download the scraping session. The e-commerce section also has a video to walk you through the process, and we’ll be adding videos to the others shortly.
We are pleased to announce our new coaching program. To help get started, our new users can receive up to two free hours of one-on-one coaching (click here for details).
Existing users, receive help planning out your project, solving that one tough issue, learn new techniques and refine your current scraping projects. Purchase hours of training by calling our offices at 800-672-0113.