vendredi 3 juin 2016

Scrapy getting redirected

I'm trying to take data from this amazon page however I am always getting sent back to the main page for HP. I've tried suggestions I've seen from other posts about changing my USER_AGENT and that has not helped in this case. Additionally, because cookies are enabled in scrapy by default I have even tried getting to the page I want through other pages first and that has not worked.

Here is my code:

def parse(self, response):
    url_to_go = "http://amazon.com"+(response.xpath('//*[@id="refinements"]/div[2]/ul[1]/li[1]/ul/li[1]/a/@href').extract()[0])

    cook = {'Accept-Encoding':'gzip, deflate', 
    'Accept-Language':'en-US,en;q=0.8',
    'Connection':'keep-alive',
    'Content-Type':'application/x-www-form-urlencoded',
    'Cookie':'session-token="lslMFQ/aZv4uOPOndfqyl4uQo+2j28Ziy3aMBwCCUsVPeFX9xoCsUv6jvR2U+YAnSxlBVTl4PtTpCeaIA13g2/XC1DqNd95tDulSOPeEbxETVBgwS4i/vTIQmUOybv+I5wYP12XCIGOh7QrpGLE+/gGTgAjM+1KaA9Ua6D2lEZoPPyONk8K4MiWAOxbjOVgaV/i5lbEbp1Kfn4PbXl555g=="; x-wl-uid=1ZDt5hegLdX+sR4SzNbD6q5TZD/tTVmo+y68B5HuediDPf5/oClQ5IbnNGcF0D+ollnxQ1vp63iw=; csm-hit=s-0AMVHWRT97Q8WVYKN4TA|1464956175963; session-id-time=2082787201l; session-id=186-8139005-7450816; ubid-main=181-6639382-1676153',
    'Host':'www.amazon.com',
    'Origin':'http://www.amazon.com',
    'Referer':'http://ift.tt/24miskS'}



    request = Request(url_to_go, headers = cook, meta={'asinlist':[]}, callback=self.scrape)
    return request

The request I return is a request for the page that I actually want to scrape. Does anyone know a way I can be able to scrape something from this webpage?




Aucun commentaire:

Enregistrer un commentaire