vendredi 14 août 2015

Parsing a table using Beautiful soup

I have been struggling with Beautiful Soup and a web page. I want to parse a specific table from a web page, but I have had problems. My code is the following:

# -*- coding: cp1252 -*-
import urllib2

from bs4 import BeautifulSoup

page    =     urllib2.urlopen("http://ift.tt/1NhpRg2").read()
soup    = BeautifulSoup(page)


data = []
table = soup.find("table", { "class" : "mytable" })
table_body = table.find('tbody')

rows = table_body.find_all('tr')
for row in rows:
    cols = row.find_all('td')
    cols = [ele.text.strip() for ele in cols]
    data.append([ele for ele in cols if ele]) # Get rid of empty values

print data

It works with another web pages, but not with this one. I get the following error:

table_body = table.find('tbody')
AttributeError: 'NoneType' object has no attribute 'find'

It seems it does not find the tag "tbody", but I have checked and it is in the code. Another problem is that when it works (other web pages), a "u" is next to every item of the table. I have searched a lot and I cannot find the problem. Thanks for your help.

Aucun commentaire:

Enregistrer un commentaire