vendredi 31 mai 2019

Webscraping help: can't find root div in soup

Webscraping sites like stockx and goat for shoe information but the html from the soup I create doesn't include the information I need which is apparently found in the root div under the body of the page. When I inspect the page manually the root div is full of information that I am trying to scrape but viewing the page source shows an empty root div.

def scrape(url):
    browser = "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:54.0) Gecko/20100101 Firefox/54.0"
    header = {"User-Agent": browser,}
    req = urllib.request.Request(url, headers=header)
    html = urllib.request.urlopen(req).read()
    soup = BeautifulSoup(html, "html.parser")
    print(soup.findAll("div",{"id":"root"}))

yields [<div id="root"></div>] as the result for either stockx or goat search result webpage.

If anyone can let me know how I can extract the information I need I would be greatly appreciative. Thank!




Aucun commentaire:

Enregistrer un commentaire