mercredi 27 mai 2020

Python WebScraping - HTML from Selenium is not what elements inspect shows

For a faculty project i wish to scrape some news webpages. Here i encountered a problem, because when i try to parse HTML code to python i get HTML that is in Source of a page, which is a lot different than Elements shown in Inspect page. I have used BeautifulSoup, requests and Selenium and got the same result.

Does anyone have any idea, how i could scrape Elements of a page, if i cannot get HTML code of the page or how to get HTML code of the page to scrape it.

from selenium import webdriver

url = 'https://www.24ur.com/novice/korona/v-revozu-znova-zagnali-proizvodnjo.html'

driver = webdriver.Chrome()
driver.get(url)
htmlx = driver.execute_script("return document.documentElement.outerHTML")

print(htmlx)

Thank you.




Aucun commentaire:

Enregistrer un commentaire