10.12.06

Scraping a Date Range

Posted in Tips at 3:53 pm by jason

Much of the time in scraping, one wants to fill in a web form and grab the results, and many of the forms want the user to fill in a date range. It’s not a daunting prospect if you just want to scrape the form once, but for jobs where you want run a scrape weekly and get a full week’s worth of data making a script for that has been challenging. I have therefore developed a simple, generic script that will figure the date for a given number of days from today, and save it in session variable.

For the purposes of this post, I’m going to make a script give me a date for a week from today in the format of a 2 digit day, 2 digit month, and 4 digit year, however I’ll make those easy to change.

To start one needs to import some useful Java componants:

import java.util.*;
import java.text.*;

These allow us to go ahead and create an instance of “right now”.

Calendar rightNow = Calendar.getInstance();

This gives me a “right now” to which I can add 7 days to thusly:

rightNow.add( Calendar.DATE, 7 );

And all that is left is to format it:

Date endDate = rightNow.getTime();
Date endDate = rightNow.getTime();
SimpleDateFormat formatter = new SimpleDateFormat( “MM/dd/yyy” );
String newDate = formatter.format( endDate );

Now I have a nicely formatted local variable named newDate that I would just need to set as a session variable for the rest of the scrape to run.

session.setVariable(”NEW_DATE”, newDate);

That’s enough to make the script work, but in order to make it into a good template, one should make it easy to find and change the things that will have to set differently in each application. My attempt to do so ended up like this:

import java.util.*;
import java.text.*;

// Set number of days to add to current date.
addDays = 7;

// Set the format in which the date should be output.
String dateFormat = “MM/dd/yyyy”;

//Figure the new date.
Calendar rightNow = Calendar.getInstance();
rightNow.add( Calendar.DATE, addDays );
Date endDate = rightNow.getTime();
SimpleDateFormat formatter = new SimpleDateFormat( dateFormat );
String newDate = formatter.format( endDate );

// Output the new date.
session.setVariable(”NEW_DATE”, newDate);

Of course you can use this process to make more than one date for your form if needed; from here it should just be a matter of some minor editing.

For information on the date formatting, see the java page at: http://java.sun.com/j2se/1.5.0/docs/api/java/text/SimpleDateFormat.html

And for a trick to make the formatting of dates far easier when you’re in screen-scraper, read up on the reformatDate method that is available in the professional edition.

10.10.06

Version 2.7.2.17a of screen-scraper available

Posted in Updates at 4:21 pm by Todd Wilson

Our to-do list is empty! This version contains all of the bug fixes and features we’ve had planned for the next version of screen-scraper. I suppose you could consider it to be more of a beta, or maybe even a release candidate. There really isn’t anything earth-shatteringly new in this version over 2.7.1.16a–mostly just bug fixes and some clean-up.

The usual caveats apply–this is alpha software, so use it at your own risk. Thanks, though, to anyone willing to help us test.

If you’re currently running version 2.7.2.9a or higher you can upgrade via Options -> Check for updates. If you’re using anything else, follow these instructions (see this page for details on why you need to follow these steps):

  1. Back up your scraping sessions (check here for help on that).
  2. Ensure screen-scraper isn’t currently running (close the workbench and server, if running).
  3. Download this file, and unzip it.
  4. Copy the contents of the zip file on top of your existing files in the screen-scraper install folder. For example, the zip file contains a “screen-scraper.jar” file which should be copied on top of your existing “screen-scraper.jar” file.
  5. Edit your “resource\conf\screen-scraper.properties” file in a text editor. Change the “Version” property to “2.7.2.17a”.
  6. Launch the screen-scraper workbench.
  7. If all of your scraping sessions have disappeared, don’t panic!
  8. Close the screen-scraper workbench.
  9. Re-open the screen-scraper workbench.

You’re done!

09.28.06

Version 2.7.2.16a of screen-scraper available

Posted in Updates at 10:50 am by Todd Wilson

Yeah, we kind of silently released 2.7.2.15a, for those of you who are keeping track. We didn’t do as much testing on it as we normally do, so I didn’t want to make it known too broadly. 2.7.2.16a is performing nicely, so feel free to have at it. There are a few new little doo-dads and minor fixes in this one. The auto-scroll on the log panel seems to be getting high marks around here. Provecho!
The usual caveats apply–this is alpha software, so use it at your own risk. Thanks, though, to anyone willing to help us test.

If you’re currently running version 2.7.2.9a or higher you can upgrade via Options -> Check for updates. If you’re using anything else, follow these instructions (see this page for details on why you need to follow these steps):

  1. Back up your scraping sessions (check here for help on that).
  2. Ensure screen-scraper isn’t currently running (close the workbench and server, if running).
  3. Download this file, and unzip it.
  4. Copy the contents of the zip file on top of your existing files in the screen-scraper install folder. For example, the zip file contains a “screen-scraper.jar” file which should be copied on top of your existing “screen-scraper.jar” file.
  5. Edit your “resource\conf\screen-scraper.properties” file in a text editor. Change the “Version” property to “2.7.2.16a”.
  6. Launch the screen-scraper workbench.
  7. If all of your scraping sessions have disappeared, don’t panic!
  8. Close the screen-scraper workbench.
  9. Re-open the screen-scraper workbench.

You’re done!

09.13.06

Version 2.7.2.14a of screen-scraper available

Posted in Updates at 4:44 pm by Todd Wilson

Not too much new in this one. It more or less just cleans up some annoying bugs that crept into the previous versions.

The usual caveats apply–this is alpha software, so use it at your own risk. Thanks, though, to anyone willing to help us test.

If you’re currently running version 2.7.2.9a or higher you can upgrade via Options -> Check for updates. If you’re using anything else, follow these instructions (see this page for details on why you need to follow these steps):

  1. Back up your scraping sessions (check here for help on that).
  2. Ensure screen-scraper isn’t currently running (close the workbench and server, if running).
  3. Download this file, and unzip it.
  4. Copy the contents of the zip file on top of your existing files in the screen-scraper install folder. For example, the zip file contains a “screen-scraper.jar” file which should be copied on top of your existing “screen-scraper.jar” file.
  5. Edit your “resource\conf\screen-scraper.properties” file in a text editor. Change the “Version” property to “2.7.2.14a”.
  6. Launch the screen-scraper workbench.
  7. If all of your scraping sessions have disappeared, don’t panic!
  8. Close the screen-scraper workbench.
  9. Re-open the screen-scraper workbench.

You’re done!

09.12.06

Using screen-scraper to automatically test embedded devices

Posted in Miscellaneous, Thoughts at 10:49 am by Todd Wilson

A while back I flew out to Huntsville, AL to work with a government contractor company on automating the testing of embedded devices. To this day I’m not entirely sure what these little machines did, but they each had a web interface that needed testing (much like that of a wireless router, if you’ve worked with those before). This isn’t the most common usage for screen-scraper, but it turned out to be just what they needed.

I worked closely with Greg Chapman, one of their engineers, and he recently wrote an article on the experience entitled Testing aerospace UUTs leads to Web solution. Greg’s a smart guy, and has continued to use screen-scraper in ways that I wouldn’t have even considered.

It’s gratifying to see screen-scraper used in so many different ways, but it’s interesting that it’s versatility has almost been a curse at times to us. Our software can be used for all kinds of purposes, but we’re finding that, from a business standpoint, we’re often better off narrowing our focus to very specific applications. As one marketing expert we consulted with put it, “You guys have plastic.” Plastic is incredibly useful, but it gains value as you craft it into something with a specific purpose. I’m planning on blogging about this idea more later, but it’s interesting to consider the pros and cons of a general-purpose tool, like screen-scraper.

09.07.06

Version 2.7.2.13a of screen-scraper available

Posted in Updates at 11:33 am by Todd Wilson

Come ‘n get it! Several clean-ups in this one, as well as a few new features. screen-scraper will now back up its database automatically. It’s rare, but we still get reports occasionally of corrupted screen-scraper databases (which can often mean loss of work). We use a database called Hypersonic, which can be amazingly fragile if its process crashes for some reason. With the automatic backups, hopefully nobody loses their work (or much of their work) in the event that their database gets hosed.

The usual caveats apply–this is alpha software, so use it at your own risk. Thanks, though, to anyone willing to help us test.

If you’re currently running version 2.7.2.9a or higher you can upgrade via Options -> Check for updates. If you’re using anything else, follow these instructions (see this page for details on why you need to follow these steps):

  1. Back up your scraping sessions (check here for help on that).
  2. Ensure screen-scraper isn’t currently running (close the workbench and server, if running).
  3. Download this file, and unzip it.
  4. Copy the contents of the zip file on top of your existing files in the screen-scraper install folder. For example, the zip file contains a “screen-scraper.jar” file which should be copied on top of your existing “screen-scraper.jar” file.
  5. Edit your “resource\conf\screen-scraper.properties” file in a text editor. Change the “Version” property to “2.7.2.13a”.
  6. Launch the screen-scraper workbench.
  7. If all of your scraping sessions have disappeared, don’t panic!
  8. Close the screen-scraper workbench.
  9. Re-open the screen-scraper workbench.

You’re done!

08.31.06

Version 2.7.2.12a of screen-scraper available

Posted in Updates at 10:34 am by Todd Wilson

I think we’re getting awful close to a public release. This one looks to be pretty stable. Not too many major changes this time around. You’ll discover a few little niceties, along with some bug fixes and clean-ups to minor issues that have probably been annoying you. As always, please let us know of any trouble you encounter.

If you’re currently running version 2.7.2.9a or higher you can upgrade via Options -> Check for updates. If you’re using anything else, follow these instructions (see this page for details on why you need to follow these steps):

  1. Back up your scraping sessions (check here for help on that).
  2. Ensure screen-scraper isn’t currently running (close the workbench and server, if running).
  3. Download this file, and unzip it.
  4. Copy the contents of the zip file on top of your existing files in the screen-scraper install folder. For example, the zip file contains a “screen-scraper.jar” file which should be copied on top of your existing “screen-scraper.jar” file.
  5. Edit your “resource\conf\screen-scraper.properties” file in a text editor. Change the “Version” property to “2.7.2.12a”.
  6. Launch the screen-scraper workbench.
  7. If all of your scraping sessions have disappeared, don’t panic!
  8. Close the screen-scraper workbench.
  9. Re-open the screen-scraper workbench.

Serve chilled and enjoy!

08.24.06

Developing software by the 15% rule

Posted in Miscellaneous, Thoughts at 5:02 pm by Todd Wilson

Writing software on a consulting basis can often be a losing proposition for developers or clients or both. There are too many things that can go wrong, and that ultimately translates into loss of time and money. The “15% rule” we’ve come up with is intended to create a win-win situation for both parties (or at least make it fair for everyone). Clients generally get what they want, and development shops make a fair profit. It’s not a perfect solution, but so far it seems to be working for us.

This may come as a surprise to some, but we make very little money selling software licenses. The vast majority of our revenue comes through consulting services–writing code for hire. Having now done this for several years, we’ve learned some hard lessons. On a few projects the lessons were so hard we actually lost money.

A few months ago I put together somewhat of a manifesto-type document intended to address the difficulties we’ve faced in developing software for clients. I’m pleased to say that it’s made a noticeable difference so far for us. My hope is that this blog entry will be read by others who develop software on a consulting basis, so that they can learn these lessons the easy way rather than the way we learned them.

What follows in this article is a summary of one of the main principles we now follow in developing software–the 15% rule. If you’d like, you’re welcome to read the full “Our Approach to Software Development” document.

For the impatient, the 15% rule goes like this…

Before undertaking a development project we create a statement of work (which acts as a contract and a specification) that outlines what we’ll do, how many hours it will require, and how much it will cost the client. As part of the contract we commit to invest up to the amount of time outlined in the document plus 15%. That is, if the statement of work says that the project will take us 100 hours to complete, we’ll spend up to 115 hours (but no more). As to where-fores and why-tos on how this works, read on.

Those that have developed software for hire know that the end product almost never ends up exactly as the client had pictured. There are invariably tweaks that will need to be made (that may or may not have been discussed up front) in order to get the thing to at least resemble what the client has in mind. And, yes, this can happen even if you spend hours upon hours fine tuning the specification to reflect the client’s wishes. Additionally, technical issues can crop up that weren’t anticipated by the programming team. In theory, the better the programming team the less likely this should be, but it doesn’t always end up that way (Microsoft’s Vista operating system is a sterling example). These two factors, among others, equate to the risk that is inherent in the project. Something isn’t going to go right, and that will almost always mean someone pays or loses more money than originally anticipated. The question is, who should be responsible to account for those extra dollars?

Up until relatively recently, we would shoulder almost all of the risk in our projects. If the app didn’t do what the client had in mind, or if unforeseen technical issues cropped up, it generally came out of our pockets. For the most part it wasn’t a huge problem, but always seemed to have at least some effect (the extreme cases obviously being when we lost money on a project).

This seems kind of unfair, doesn’t it? The risk inherent to the project isn’t necessarily the fault of either party. It’s just there. We didn’t put it there, and neither did the client. As such, it shouldn’t be the case that one party shoulders it all. That’s where the 15% rule comes in.

The 15% rule allows both parties to share the risk. By following this rule, we’re acknowledging that something probably won’t go as either party intended, so we need a buffer to handle the stuff that spills over. By capping it at a specific amount, though, we’re also ensuring that the buffer isn’t so big that it devours the profits of the developers.

For the most part, the clients with whom we’ve used the 15% rule are just fine with it. It is a pretty reasonable arrangement, after all. We have had the occasional party that squirms and wiggles about it, but, in the end, they’ve gone along with it and I think everyone has benefited as a result.

08.10.06

Version 2.7.2.11a of screen-scraper available

Posted in Updates at 2:19 pm by Todd Wilson

Just took version 2.7.2.11a fresh out of the oven. This one contains some overdue GUI enhancements that I think will delight you. I’d especially recommend the context menus (try right clicking on items in the tree).

If you’re currently running version 2.7.2.9a or higher you can upgrade via Options -> Check for updates. If you’re using anything else, follow these instructions (see this page for details on why you need to follow these steps):

  1. Back up your scraping sessions (check here for help on that).
  2. Ensure screen-scraper isn’t currently running (close the workbench and server, if running).
  3. Download this file, and unzip it.
  4. Copy the contents of the zip file on top of your existing files in the screen-scraper install folder. For example, the zip file contains a “screen-scraper.jar” file which should be copied on top of your existing “screen-scraper.jar” file.
  5. Edit your “resource\conf\screen-scraper.properties” file in a text editor. Change the “Version” property to “2.7.2.11a”.
  6. Launch the screen-scraper workbench.
  7. If all of your scraping sessions have disappeared, don’t panic!
  8. Close the screen-scraper workbench.
  9. Re-open the screen-scraper workbench.

Provecho!

08.03.06

Version 2.7.2.10a of screen-scraper available

Posted in Updates at 2:23 pm by Todd Wilson

For those of you who have noticed some of the annoying quirks in 2.7.2.9a, try 2.7.2.10a. It should clear up a lot of them. Nothing too major in this release; mostly minor bug fixes.

If you’re currently running version 2.7.2.9a you can upgrade via Options -> Check for updates. If you’re using anything else, follow these instructions (see this page for details on why you need to follow these steps):

  1. Back up your scraping sessions (check here for help on that).
  2. Ensure screen-scraper isn’t currently running (close the workbench and server, if running).
  3. Download this file, and unzip it.
  4. Copy the contents of the zip file on top of your existing files in the screen-scraper install folder. For example, the zip file contains a “screen-scraper.jar” file which should be copied on top of your existing “screen-scraper.jar” file.
  5. Edit your “resource\conf\screen-scraper.properties” file in a text editor. Change the “Version” property to “2.7.2.10a”.
  6. Launch the screen-scraper workbench.
  7. If all of your scraping sessions have disappeared, don’t panic!
  8. Close the screen-scraper workbench.
  9. Re-open the screen-scraper workbench.

Enjoy!

« Previous entries · Next entries »