samedi 28 décembre 2019

Web scraping with Scrapy / BeautifulSoup -- Authentification required

I am logged in into a webpage with required authentification, however, if I try to web scrape the content, I'm just getting the HTML content of the login page. I tried to use basic authentification inside the scrapy shell like:

from w3lib.http import basic_auth_header
from scrapy import Request

auth = basic_auth_header(your_user, your_password)
req = Request(url="http://example.com", headers={'Authorization': auth})
fetch(req)

It did not work. I'll be still redirected to the login page. I can fully inspect the HTML content if I do it manually, but I'm not able to automate this. Any ideas? Maybe I need the CSRF token? I should note, that I'm a beginner and have not that much knowledge. I could miss something very obvious / important.




Aucun commentaire:

Enregistrer un commentaire