samedi 30 juillet 2016

Different behaviour when retrieving a site from browser or programmatically

I am trying to write a basic web crawler, and I thought it would be interesting to do this with a web site I visit frequently http://ift.tt/1grjCUa. When you first visit this website, you need to login, by providing your postcode and your email-address. This is done via a POST request to the server, which looks somewhat like the following:

register-ticket=<postcode>&register-email=<email>&login=

This works fine in the browser, and returns an HTTP/1.1 200 OK response. However, when I do it programmatically using the Casablanca library, I get an HTTP/1.1 301 Found response, pointing to http://ift.tt/2aleTX2.

Does anyone have any idea what could cause the same request to give two different responses? The library I'm using doesn't seem to have any facility to print the whole HTTP request before it sends it, but I wouldn't have thought any of the other header fields should influence the request?

I'm new to HTTP, so forgive me if there is something elementary missing!




Aucun commentaire:

Enregistrer un commentaire