Using OCR with screen-scraper

Within screen-scraper you have the ability to call outside programs directly from your scripts.  The following is an example scraping session that makes use of Tesseract OCR and Imagemagick in order to take an image from the internet and attempt to read the text of the image. As is, the scraping session is intended to … Read moreUsing OCR with screen-scraper

Exporting scraping sessions that use session.executeScript

Many have probably noticed that when a scraping session is exported from screen-scraper all of the scripts invoked from within that scraping session get exported along with it.  All of the scripts, that is, except those that get invoked via the session.executeScript method.  The exporter isn’t quite smart enough to actually parse the text of … Read moreExporting scraping sessions that use session.executeScript

Scraping ASP.NET Sites

Microsoft ASP.NET sites have consistently proven to be some of the most difficult to scrape. This is due to their unconventional nature and cryptic information passed between your browser and the server. You’ll know you’re at an ASP.NET site when your URLs end in .aspx, your links look like this: javascript:__doPostBack(‘gvLicensing’,’Select$0′) And your POSTs look … Read moreScraping ASP.NET Sites