samedi 7 mars 2020

I'm scraping a site with Beautiful Soup, but when I try to find a div element with an id of shell it returns None

from bs4 import BeautifulSoup
from urllib import request
import csv

# adding a correct user agent
headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36'}
#The url to be scraped
company_page = 'https://www.goodreads.com/list/show/6.Best_Books_of_the_20th_Century?'

#opening the page
page_request = request.Request(company_page, headers=headers)
page = request.urlopen(page_request)

#parse the html using beautiful soup
html_content = BeautifulSoup(page, 'html.parser')

#Parsing some of the title elements
title = html_content.find('div',id='shell')
print(title)



Aucun commentaire:

Enregistrer un commentaire