I have a web scraping script that has been working for months but today it did not. The error occurs when calling:
import bs4
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = f'https://www.blocket.se/annonser/hela_sverige/fordon/bilar?cb=40&cbl1=6&cchb=1&ccsc=1&cg=1020&f=c&mye=2017&mys=2013&page=1&sort=date'
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
containers = page_soup.findAll("div", {"class":"styled__Wrapper-sc-1kpvi4z-0 itHtzm"})
print(len(containers))
This should be 40 elements long however it is:
0
Now the following command used to get all the wanted containers however now it finds nothing. By printing the page_soup variable I found that the class had changed name to gSWafH instead of itHtzm.
containers = page_soup.findAll("div", {"class":"styled__Wrapper-sc-1kpvi4z-0 gSWafH"})
print(len(containers))
Instead gives the wanted:
40
Similar changes were true for all classes and I first thought that the website had changed. However, if I read the HTML code on the website myself nothing has changed.
Why is there a difference between the HTML code found by manually going to the site and viewing the HTML code in the browser and reading it using BS4?
I know that I could change all of the class names/searches to fix the script however it's a rather long script and I would much prefer to know the cause of the difference.
Aucun commentaire:
Enregistrer un commentaire