Scraping AMF Sites

Posted in Tips on 11/15/11by Todd Wilson

Most of the time when extracting information from web sites you’ll deal with HTML, which is generally pretty straightforward to deal with.  Occasionally, though, content will be delivered via something like a Java applet or Flash movie.  Just recently I completed a project that dealt with extracting data from a Flash movie, where the data was delivered from the server via Adobe’s Action Message Format (AMF).  I thought I’d share a bit about my experience here, which will hopefully be useful to others, as well as myself the next time I have to do this 🙂

The main tool you’ll deal with when scraping AMF-based data is Adobe’s Java AMF Client.  It handles most of the heavy lifting for you, though you’ll still need to do a fair amount of coding.  The other tool that is indispensable is Charles proxy, which has a built-in AMF parser.  Without it you’ll be flying blind.

The basic approach you’ll want to take is to proxy the site via Charles with your web browser, pick out the AMF requests that seem relevant, then replicate those in code.  In my case I also had to download PDF files (standard HTTP), so I actually had to run it all in screen-scraper, combining normal screen-scraper stuff with the Java AMF Client stuff.  There was also a login that had to be done outside of AMF.  Anyway, just be aware that you may have to combine both approaches in your own project.

I’m going to be providing some example code below in Interpreted Java (which is just BeanShell) as a screen-scraper script.  You’ll need to do a bit of modification if you want to run this as straight Java.

Digging into the details, here’s how my code looks that sets up the initial AMF stuff:

import flex.messaging.io.ArrayCollection;
import flex.messaging.messages.*;
import flex.messaging.io.amf.client.AMFConnection;
import flex.messaging.io.amf.client.exceptions.ClientStatusException;
import flex.messaging.io.amf.client.exceptions.ServerStatusException;
import flex.messaging.util.UUIDUtils;
import flex.messaging.io.amf.ASObject;

// Create the AMF connection.
AMFConnection amfConnection = new AMFConnection();

// Used for debugging...
//Proxy proxy = new Proxy( Proxy.Type.HTTP, new InetSocketAddress( "localhost", 8888 ) );
//amfConnection.setProxy( proxy );

// Connect to the remote url.
url = "http://www.myamfsite.com/messagebroker/amf";
try
{
amfConnection.connect(url);
}
catch( ClientStatusException cse )
{
session.logError( cse );
return;
}

// Set a few headers we'll want throughout the session.
amfConnection.addHttpRequestHeader( "Content-type", "application/x-amf" );
amfConnection.addHttpRequestHeader( "Referer", "http://www.myamfsite.com/media/MyMovie.swf" );

Here we’re setting up an AMF connection to a server whose AMF end point is found at http://www.myamfsite.com/messagebroker/amf.  The commented-out proxy code allows us to send it all through Charles; that way we can compare the requests our code produces with those we record when browsing the web site via our web browser.  Kind of an apples-to-apples comparison that helps to root out bugs.  If your code doesn’t seem to have the desired effect, compare what’s happening via Charles with the requests from your browser.  Ideally they should match as closely as possible.  I also found that I had to add the two request headers that you’ll find at the end.  The referer may or may not be necessary, but it’s likely that the content-type header is, since the Flash server would normally be expecting requests from a Flash movie, which would probably include that header by default.

Once you’ve done the initialization you can start adding AMF requests to get the data you’re after.  Again, you’ll want to do this by recording the requests from your browser in Charles, then translate those into code.  Here’s a screen-shot of a recorded AMF request from Charles:

And here’s how I translated the request into code:

CommandMessage message1 = new CommandMessage( CommandMessage.CLIENT_PING_OPERATION );
Object[] params1 = new Object[]
{
message1
};
HashMap headers1 = new HashMap();
message1.setHeader( "DSId", "nil" );
message1.setMessageId( UUIDUtils.createUUID() );
Object result1 = amfConnection.call( "null", params1 );
session.log( "Result 1: " + result1 );

Based on the request recorded by Charles, it’s obvious that this should be a CommandMessage.  The PING part of it was a bit trickier.  This is the “operation” portion of the request, which you’ll notice is recorded by Charles only as “5”.  This is where I had to bit of sleuthing through the Java AMF Client source code (which is fortunately open source and freely downloadable).  If you’ve downloaded that source code you’ll find the CommandMessage class here in the bundle: modules/core/src/flex/messaging/messages/CommandMessage.java.  Notice also in the request how I set the header “DSId” to be “nil”, which is also evident in what Charles recorded.  Again, we’re trying to get our code to match as closely as possible what was recorded by our web browser.  I gave the request a unique ID, then asked the connection to make the call.

The next request I needed was a bit different, but not too difficult to recreate from what Charles recorded:

I’ve blurred out the username I used.  Here’s the corresponding code:

// Authenticate the current user.
RemotingMessage message2 = new RemotingMessage();
message2.setOperation( "getUserByUserName" );
Object[] params2 = new Object[]
{
message2
};
String[] body2 = new String[]
{
"myUserName"
};
message2.setBody( body2 );
message2.setDestination( "XYZ" );
message2.setMessageId( UUIDUtils.createUUID() );
Object result2 = amfConnection.call( "null", params2 );
session.log( "Result 2: " + result2 );

Again, you can hopefully see how the pieces in the code correlate to what Charles recorded.

From this point it was simply a matter of adding requests as needed, along with a fair amount of trial and error to ensure that I was matching as closely as possible the original AMF requests.  The only item that tripped me up for a while that’s probably worth mentioning was when Charles recorded the body portion of the request as containing simply an “Object”.  When I did the same in code the server didn’t like it, and it took me a bit before I realized what it actually wanted was an “ASObject”.  So the code I used to create the body looks like this:

Object[] body3 = new Object[]
{
new ASObject()
};

A few last items that might be helpful:

  • The Java AMF Client download contains quite a few dependency files.  You’ll have to figure out exactly which ones of those you truly need.  In my case, in using this within screen-scraper, I ended up only needing two of the jars from the bundle: flex-messaging-common.jar and flex-messaging-core.jar.
  • As it stands the Java AMF Client can’t handle HTTPS, nor can it handle HTTPS sites that utilize an invalid secure certificate.  I ended up modifying the source for the AMFConnection class in order to add this functionality (in the bundle that class is found here: modules/core/src/flex/messaging/io/amf/client/AMFConnection.java).  You can download a zip file here that contains that modified source file as well as a compiled version of the flex-messaging-core.jar files, which contains that modified class.  If you end up modifying that class further in the bundle you can compile it with a simple “ant core” from the command line.  You need not compile the whole thing.

6 Comments »

  1. Paula Fernandes said,

    June 6, 2012 at 8:36 am

    Hello.

    First of all, this was a very good post!

    I’m wondering if you could send me the source code from this example. I’m interested in call another methods from java client, not only login method.

    att,
    paula

  2. Todd Wilson said,

    June 6, 2012 at 4:29 pm

    Hi,

    Glad you found the post helpful. Unfortunately, the code I can share for this is already in the post. The other code I have (which isn’t a lot) deals more with proprietary information, so I can’t share it.

    Kind regards,

    Todd

  3. Abhishek said,

    August 27, 2013 at 11:18 pm

    Hi,

    Thanks for the wonderful post.

    I am also trying to scrape an AMF website. In my case, the body contains Externalized Object (flex.messaging.io.ArrayCollection). Now when I try to send this request, I get a ClientStatusException stating “Class ‘flex.messaging.io.amf.ASObject’ must implement java.io.Externalizable to receive client IExternalizable instances.” Is it possible to implement this without knowing the server side components?

  4. Todd Wilson said,

    August 28, 2013 at 9:40 am

    Hi,

    I ran into something similar to what you’re describing recently. In my case I was able to send parameters by creating ASObject’s on my side, then sending those up. There was also a case where the server was sending down objects that were custom classes that extended ASObject and ArrayCollection. I was able to take those same objects, then send them back up as parameters. I can see where there could also be a case, though, in which the server requires that objects be sent up as parameters that take the form of custom classes. That may be what you’ve encountered here. If that’s the case I’m not quite sure how you’d handle it. You obviously don’t have their class definitions, so you can’t create the objects from scratch. I’d probably still try, though, to send up parameters either s ASOjbect’s or ArrayCollection’s. You might also see if they’re sending down any objects that are of the type you need, in which case you may be able to manipulate them and send them back up as parameters.

    Best of luck,

    Todd

  5. Mattheuz said,

    January 28, 2016 at 1:54 pm

    hi I wanted to ask you something !!! my proxy charles does not have the MFA button What do I do ??

  6. jason said,

    February 15, 2016 at 5:18 pm

    Charles is always getting updated, but you’d really need to ask them. I would help you if I knew, but I don’t use it that often.

Leave a Comment