dimanche 28 mars 2021

Webscraping with Python Requests and getting Access Denied even after updating headers

this webscraper was working for a while but the website must have been updated so it no longer works. After each request I get an Access Denied error, I have tried adding headers but still get the same issue. This is what the code prints:

</html>

<html><head>
<title>Access Denied</title>
</head><body>
<h1>Access Denied</h1>

You don't have permission to access "http://www.jdsports.co.uk/product/white-nike-air-force-1-shadow-womens/15984107/" on this server.<p>
Reference #18.4d4c1002.1616968601.6e2013c
</p></body>
</html>

Heres the part of the code to get the HTML:

scraper=requests.Session()

headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36',
}
            
html = scraper.get(info[0], proxies= proxy_test, headers=headers).text
soup = BeautifulSoup(html, 'html.parser')

print(soup)
stock = soup.findAll("button", {"class": "btn btn-default"})

What else can I try to fix it? The website I was to scrape is https://www.jdsports.co.uk/




Aucun commentaire:

Enregistrer un commentaire