vendredi 29 janvier 2016

How to handle rowspan while scrapping a wikitable using python?

I am trying to scrape the data stored in the table of this wikipedia page http://ift.tt/20bqqk1. However i am unable to scrape the full data stored in rowspan Hers's what i wrote so far:

from bs4 import BeautifulSoup
from urllib.request import urlopen

wiki = urlopen("http://ift.tt/20bqqk1")

soup = BeautifulSoup(wiki, "html.parser")

table = soup.find("table", { "class" : "wikitable" })
for row in table.findAll("tr"):
    cells = row.findAll("td")

    if cells:
        name = cells[0].find(text=True)
        pic = cells[1].find("img")
        strt = cells[2].find(text=True)
        end = cells[3].find(text=True)
        pri = cells[6].find(text=True)

        z=name+'\n'+pic+'\n'+strt+'\n'+end+'\n'+pri
        print z




Aucun commentaire:

Enregistrer un commentaire