vendredi 29 janvier 2016

Wiki scraping using python

I am trying to scrape the data stored in the table of this wikipedia page http://ift.tt/20bqqk1. However i am unable to scrape the full data Hers's what i wrote so far:

from bs4 import BeautifulSoup
import urllib2
wiki = "http://ift.tt/20bqqk1"
header = {'User-Agent': 'Mozilla/5.0'} #Needed to prevent 403 error on Wikipedia
req = urllib2.Request(wiki,headers=header)
page = urllib2.urlopen(req)
soup = BeautifulSoup(page,"html.parser")

name = ""
pic = ""
strt = ""
end = ""
pri = ""
x=""
table = soup.find("table", { "class" : "wikitable" })
for row in table.findAll("tr"):
    cells = row.findAll("td")

    if len(cells) == 8:
        name = cells[0].find(text=True)
        print name`

The output obtained is: Jairamdas Daulatram, Surjit Singh Barnala, Rao Birendra Singh

Whereas the output should be: Jairamdas Daulatram followed by Panjabrao Deshmukh




Aucun commentaire:

Enregistrer un commentaire