02.28.06

2.6.0.5a of screen-scraper available

Posted in Updates at 1:48 pm by Todd Wilson

Get it while it’s hot. This could become version 2.7. We’re doing our own internal hammering on this one, but please let me know if any of you out there find bugs we miss. As usual, you can reach me at todd-[at]-screen-scraper.com.

02.23.06

New screen-scraper tutorial on inserting data into databases

Posted in Updates at 7:24 pm by Todd Wilson

I must be on some kind of tutorial rampage. I’ve just written a fifth tutorial on an oft-requested topic: inserting scraped data into databases. You can find it here: http://www.screen-scraper.com/support/tutorials/tutorial5/tutorial_overview.php. For quite a while I mulled over how to approach this given how many ways there are to go about it. Recently I had somewhat of an epiphany, though, on a relatively simple way to do it using scrapeable files that works independent of the database or programming language you may want to use.

As before, any feedback is appreciated. You can drop me a line at todd-|at|-screen-scraper.com.

02.20.06

New Tutorial

Posted in Updates at 6:55 pm by Todd Wilson

We’ve just added a new tutorial to our site: Tutorial 4: Scraping an E-commerce Site from External Programs. The idea here is to meld stuff from Tutorial 2: Extending Hello World with the e-commerce example from Tutorial 3: Scraping an E-commerce Site. Any feedback would be appreciated…

02.16.06

Version 2.6.0.2a of screen-scraper available

Posted in Updates at 5:33 pm by Todd Wilson

Today we released version 2.6.0.2a of screen-scraper, and would appreciate feedback. It’s a minor update addressing a few bugs, with the primary fix intending to address database corruption we’d have reports of. It’s a rare issue, but can be a bit catastrophic if someone hasn’t been backing up their work.

If those using the professional edition of screen-scraper wouldn’t mind upgrading we’d be mighty grateful. Ideally you shouldn’t notice any changes in functionality at all, aside from the absence of database corruption of course :)

This is an alpha version of screen-scraper, so you may need to update your settings to allow screen-scraper to update. To do this open the “Settings” dialog box (click the wrench icon) then check the box labelled “Allow upgrading to unstable versions.”
If you encounter any trouble please either drop us a support request, or feel free to email me directly. My email address is my first name at screen-scraper.com.

Data mining vs. screen-scraping

Posted in Thoughts at 4:49 pm by Todd Wilson

Data mining isn’t screen-scraping. I know that some people in the room may disagree with that statement, but they’re actually two almost completely different concepts.

In a nutshell, you might state it this way: screen-scraping allows you to get information, where data mining allows you to analyze information. That’s a pretty big simplification, so I’ll elaborate a bit.

The term “screen-scraping” comes from the old mainframe terminal days where people worked on computers with green and black screens containing only text. Screen-scraping was used to extract characters from the screens so that they could be analyzed. Fast-forwarding to the web world of today, screen-scraping now most commonly refers to extracting information from web sites. That is, computer programs can “crawl” or “spider” through web sites, pulling out data. People often do this to build things like comparison shopping engines, archive web pages, or simply download text to a spreadsheet so that it can be filtered and analyzed.

Data mining, on the other hand, is defined by Wikipedia as the “practice of automatically searching large stores of data for patterns.” In other words, you already have the data, and you’re now analyzing it to learn useful things about it. Data mining often involves lots of complex algorithms based on statistical methods. It has nothing to do with how you got the data in the first place. In data mining you only care about analyzing what’s already there.

The difficulty is that people who don’t know the term “screen-scraping” will try Googling for anything that resembles it. We include a number of these terms on our web site to help such folks. For example, we created pages entitled Text Data Mining, Automated Data Collection, Web Site Data Extraction, and even Web Site Ripper (I suppose “scraping” is sort of like “ripping” :) ). So it presents a bit of a problem–we don’t necessarily want to perpetuate a misconception (i.e., screen-scraping = data mining), but we also have to use terminology that people will actually use.

02.15.06

Welcome!

Posted in Miscellaneous at 12:40 pm by Todd Wilson

I’m ashamed to say that this is my first foray into keeping a blog. I’ve been watching other blogs for some time, so hopefully I can provide something useful here based on what I’ve learned from others.

This blog will have a few purposes. One of them will be to provide a way for people to hear about updates we make to our screen-scraper application. We often create alpha versions of screen-scraper that some early adopters may be interested in trying out, so if that’s you then you may want to keep an eye on this blog. I’ll also post updates we make to our web site, such as new tutorials and FAQ’s.

I’m also hoping to use this to give tips, hints, and tricks in using screen-scraper that you may not find in our documentation unless you really root around. If you use screen-scraper regularly it may help you to subscribe in case any useful nuggets get posted.

If all goes well this will also hopefully be a means for people to provide feedback on screen-scraper. We use screen-scraper everyday, and we’re always looking for ways to make it better. Don’t hesitate to post complaints, compliments, or suggestions. Also feel free to post your own tips and such in the comments.

Hopefully no one will get too bored or annoyed by this, but I’m also planning on giving general thoughts on screen-scraping and data extraction. I’ll also likely give some insights into our business, and my personal impressions on being involved in a small company.