web: BeautifulSoup not reading the same source HTML code

lundi 21 décembre 2020

BeautifulSoup not reading the same source HTML code

I have a web scraping script that has been working for months but today it did not. The error occurs when calling:

import bs4
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = f'https://www.blocket.se/annonser/hela_sverige/fordon/bilar?cb=40&cbl1=6&cchb=1&ccsc=1&cg=1020&f=c&mye=2017&mys=2013&page=1&sort=date'
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

page_soup = soup(page_html, "html.parser")
containers = page_soup.findAll("div", {"class":"styled__Wrapper-sc-1kpvi4z-0 itHtzm"})
print(len(containers))

This should be 40 elements long however it is:

Now the following command used to get all the wanted containers however now it finds nothing. By printing the page_soup variable I found that the class had changed name to gSWafH instead of itHtzm.

containers = page_soup.findAll("div", {"class":"styled__Wrapper-sc-1kpvi4z-0 gSWafH"})
print(len(containers))

Instead gives the wanted:

Similar changes were true for all classes and I first thought that the website had changed. However, if I read the HTML code on the website myself nothing has changed.

Why is there a difference between the HTML code found by manually going to the site and viewing the HTML code in the browser and reading it using BS4?

I know that I could change all of the class names/searches to fix the script however it's a rather long script and I would much prefer to know the cause of the difference.

web

lundi 21 décembre 2020

BeautifulSoup not reading the same source HTML code

Aucun commentaire:

Enregistrer un commentaire