Combining Scraped Data from Multiple Sites

Often data sets become richer when they’re combined together. A good example of this is in a small study done by Streaming Observer on the quality of movies available from the big streaming services–Amazon, Netflix, Hulu, and HBO. The study concluded that, even though Amazon has by far the most movies, Netflix has more quality movies than the other three combined. This was determined by combining data about the movies available from each streaming service with data from Rotten Tomatoes, which ranks the quality of movies.

Read moreCombining Scraped Data from Multiple Sites

Enterprise-Scale Screen-Scraping

One of the main aspects that I think differentiates screen-scraper from many other solutions is its ability to handle large-scale scraping needs.  Additionally, it was designed from the ground up to integrate with other systems, so it generally fits nicely into most any existing setup. If you’re doing a simple one-off data extraction project screen-scraper … Read moreEnterprise-Scale Screen-Scraping

Data Cravings

Yesterday ReadWriteWeb published an article entitled “Overwhelmed Executives Still Crave Big Data, Says Survey“.  The basic gist of it is that data is vital to making business decisions, and many managers feel that they don’t have enough of it.  This got me thinking about how screen-scraping plays into all of this. At a basic level, … Read moreData Cravings

Using screen-scraper to automatically test embedded devices

A while back I flew out to Huntsville, AL to work with a government contractor company on automating the testing of embedded devices. To this day I’m not entirely sure what these little machines did, but they each had a web interface that needed testing (much like that of a wireless router, if you’ve worked … Read moreUsing screen-scraper to automatically test embedded devices

Three common methods for data extraction

Building off of my earlier posting on data discovery vs. data extraction, in the data extraction phase of the web scraping process you’ve already arrived at the page containing the data you’re interested in, and you now need to pull it out of the HTML. Probably the most common technique used traditionally to do this … Read moreThree common methods for data extraction

Data discovery vs. data extraction

Looking at screen-scraping at a simplified level, there are two primary stages involved: data discovery and data extraction. Data discovery deals with navigating a web site to arrive at the pages containing the data you want, and data extraction deals with actually pulling that data off of those pages. Generally when people think of screen-scraping … Read moreData discovery vs. data extraction

Data mining vs. screen-scraping

Data mining isn’t screen-scraping. I know that some people in the room may disagree with that statement, but they’re actually two almost completely different concepts. In a nutshell, you might state it this way: screen-scraping allows you to get information, where data mining allows you to analyze information. That’s a pretty big simplification, so I’ll … Read moreData mining vs. screen-scraping