lundi 21 septembre 2015

understanding invalid literal error for web scraping

I am trying to scrape from Wikipedia the Billboard top 100 for the years 1992 to 2014, and then clean the data. I get an "invalid literal" error at the end:

years = range(1992,2015)
yearstext = dict()
for year in years:
    t_1992=requests.get('http://ift.tt/1MnzJDb' % {"year":year})
    soup = BeautifulSoup(t_1992.text, "html.parser")
    yearstext[year]=soup

def parse_year(year, ytextdixt):
    rows = soup.find("table", attrs={"class": "wikitable"}).find_all("tr")[1:]
    cleaner = lambda r: [r[0].get_text(), int(r[1].get_text()), r[2].get_text(), r[2].find("a").get("href"), r[3].get_text(),r[3].find("a").get("href")]
    fields = ["band_singer", "ranking", "song", "songurl","titletext","url"]
    songs = [dict(zip(fields, cleaner(row.find_all("td")))) for row in rows]

ValueError: invalid literal for int() with base 10: 'Pharrell Williams'

Anyone know the reason why this is?




Aucun commentaire:

Enregistrer un commentaire