I have been struggling with Beautiful Soup and a web page. I want to parse a specific table from a web page, but I have had problems. My code is the following:
# -*- coding: cp1252 -*-
import urllib2
from bs4 import BeautifulSoup
page = urllib2.urlopen("http://ift.tt/1NhpRg2").read()
soup = BeautifulSoup(page)
data = []
table = soup.find("table", { "class" : "mytable" })
table_body = table.find('tbody')
rows = table_body.find_all('tr')
for row in rows:
cols = row.find_all('td')
cols = [ele.text.strip() for ele in cols]
data.append([ele for ele in cols if ele]) # Get rid of empty values
print data
It works with another web pages, but not with this one. I get the following error:
table_body = table.find('tbody')
AttributeError: 'NoneType' object has no attribute 'find'
It seems it does not find the tag "tbody", but I have checked and it is in the code. Another problem is that when it works (other web pages), a "u" is next to every item of the table. I have searched a lot and I cannot find the problem. Thanks for your help.
Aucun commentaire:
Enregistrer un commentaire