vendredi 4 septembre 2015

Beautiful Soup Python loop iteration not completing

I have two problems with the code.First the data is not displaying properly under the field headings and secondly the loop is only grabbing part of the Data from the html.The code attempts to extract the 14 events which are all on one page on the website.The HTML code is identical for every event within the page,(i.e the html is just repeated over and over).The first problem lies with the resulting data and field headings.I should be getting this: Fin,Greyhound,Trap,SP,Time/Sec.,Time,Distance,Trainer,Comment

1,Bernies Toughguy,3,7/4F,3.63,23.91,(Trainer: M N Fenwick),"Comment: EP,SnLd

2,Gentle Kewell,2,7/2,3.70,24.01 (1 1/4),(Trainer: J M Liles),Comment: MidToRls,RanOn

3,Tintreach Harry,5,3/1,3.72,24.17 (2),(Trainer: A C B Green),"Comment: BmpRnUp&2,Crd 1/4"

4,Colorado Teegan,4,7/1,3.74,24.33 (2),(Trainer: M N Fenwick),"Comment: Wide,EvCh"

5,Premarket Honey,6,6/1,3.68,24.51 (2 1/4),(Trainer:A C B Green),"Comment: SAw,Crd2"

6,Malbay Roxy,1,7/2,3.81,24.57 (3/4),(Trainer: M N Fenwick),"Comment: EP,SnLd"

Here each piece of data fall correctly beneath each field(in Bold) heading,i.e Finishing Position Dogname etc.However when I run the program I get this:

Fin,Greyhound,Trap,SP,Time/Sec.,Time/Distance,(Trainer: M N Fenwick),"Comment: EP,SnLd"

1,Bernies Toughguy,3,7/4F,3.63,23.91,(Trainer: J M Liles),"Comment: MidToRls,RanOn"

2,Gentle Kewell,2,7/2,3.70,24.01 (1 1/4),(Trainer: A C B Green),Comment: "BmpRnUp& 1/4"

3,Tintreach Harry,5,3/1,3.72,24.17 (2),(Trainer: A C B Green),"Comment: "BmpRnUp&2,Crd 1/4"

4,Colorado Teegan,4,7/1,3.74,24.33 (2),(Trainer: M N Fenwick),"Comment: Wide,EvCh"

5,Premarket Honey,6,6/1,3.68,24.51 (2 1/4),(Trainer: J M Liles),"Comment: SAw,Crd2"

6,Malbay Roxy,1,7/2,3.81,24.57 (3/4),(Trainer: B D O'sullivan),"Comment: EP,SnLd"

Notice that in the first line which should house the Field names I am getting some of the field names but then the last few are replaced by the Name of a trainer and a comment,(italics)this has the effect of messing the remainder of the data up in the various fields.

The second problem may have something to do with the loop iteration.As I have already stated the HTML on the page is quite uniform but for some reason when I run the program the data stops on 5th participant(Avenue Bound),in the 6th event(The 11.51) on the card,when there is actually 14 events on the card so the loop is failing the remainder of events.So the loop seems to be breaking down but I cant see any tangible reason in the HTML.Below is the code I have tried many variations of the code but I cant seem to crack it.I did think that I may have to include code to determine the number of iterations in the loop but python loops are different to C loops and being new to this I cant find anything. Any help much appreciated.

import csv
from urllib import urlopen
from bs4 import BeautifulSoup
html = urlopen ("http://ift.tt/1UwpzlD")

bsObj = BeautifulSoup(html)

one = bsObj.findAll("li", {"class": "first essential fin"})
two = bsObj.findAll("li", {"class": "essential greyhound"})

three = bsObj.findAll("li", {"class": "trap"})   
four = bsObj.findAll("li", {"class": "sp"})
five = bsObj.findAll("li", {"class": "timeSec"})
six = bsObj.findAll("li", {"class": "timeDistance"})
seven = bsObj.findAll("li", {"class": "essential trainer"})
eight = bsObj.findAll("li", {"class": "first essential comment"})

firstessentialfin = [a.getText().strip() for a in one]
essentialgreyhound = [b.getText().strip() for b in two]
trap = [c.getText().strip() for c in three]
sp = [d.getText().strip() for d in four]
timeSec = [e.getText().strip() for e in five]
timeDistance = [f.getText().strip() for f in six]
essentialtrainer = [g.getText().strip() for g in seven]
firstessentialcomment = [h.getText().strip() for h in eight]

with open('dogfile.csv', 'wb') as csvfile:
    writer = csv.writer(csvfile, delimiter=",")
    for c in   zip(firstessentialfin,essentialgreyhound,trap,sp,timeSec,timeDistance,esssentialtrainer, firstessentialcomment):
        writer.writerow(c)




Aucun commentaire:

Enregistrer un commentaire