I'm trying to take data from this amazon page however I am always getting sent back to the main page for HP. I've tried suggestions I've seen from other posts about changing my USER_AGENT and that has not helped in this case. Additionally, because cookies are enabled in scrapy by default I have even tried getting to the page I want through other pages first and that has not worked.
Here is my code:
def parse(self, response):
url_to_go = "http://amazon.com"+(response.xpath('//*[@id="refinements"]/div[2]/ul[1]/li[1]/ul/li[1]/a/@href').extract()[0])
cook = {'Accept-Encoding':'gzip, deflate',
'Accept-Language':'en-US,en;q=0.8',
'Connection':'keep-alive',
'Content-Type':'application/x-www-form-urlencoded',
'Cookie':'session-token="lslMFQ/aZv4uOPOndfqyl4uQo+2j28Ziy3aMBwCCUsVPeFX9xoCsUv6jvR2U+YAnSxlBVTl4PtTpCeaIA13g2/XC1DqNd95tDulSOPeEbxETVBgwS4i/vTIQmUOybv+I5wYP12XCIGOh7QrpGLE+/gGTgAjM+1KaA9Ua6D2lEZoPPyONk8K4MiWAOxbjOVgaV/i5lbEbp1Kfn4PbXl555g=="; x-wl-uid=1ZDt5hegLdX+sR4SzNbD6q5TZD/tTVmo+y68B5HuediDPf5/oClQ5IbnNGcF0D+ollnxQ1vp63iw=; csm-hit=s-0AMVHWRT97Q8WVYKN4TA|1464956175963; session-id-time=2082787201l; session-id=186-8139005-7450816; ubid-main=181-6639382-1676153',
'Host':'www.amazon.com',
'Origin':'http://www.amazon.com',
'Referer':'http://ift.tt/24miskS'}
request = Request(url_to_go, headers = cook, meta={'asinlist':[]}, callback=self.scrape)
return request
The request I return is a request for the page that I actually want to scrape. Does anyone know a way I can be able to scrape something from this webpage?
Aucun commentaire:
Enregistrer un commentaire