samedi 1 octobre 2016

Web Scraping not working?

So i was looking for the best thing i like about software . Then i found out about web scraping I found it really amazing so with my python experience i got some hands-on at some Beautiful soup and requests and here's the Code

import html5lib
import requests
from bs4 import BeautifulSoup as BS

# Get all the a strings , next siblings and next siblings
def makeSoup(urls):
    url = requests.get(urls).text
    return BS(url,"html5lib")   

def something(soup):
    for anchor in soup.findAll("a",{"data-type":"externalLink"}):
        print(anchor.string)
        next_sibling = anchor.nextSibling
        water = str(next_sibling.string)
        water = water[0:5]
        while  water != "(202)":
            next_sibling = next_sibling.nextSibling
            if next_sibling == None:
                continue
            if next_sibling.string != None:
                print(next_sibling.string)
                water = str(next_sibling.string)
                water = water[0:5]

soup = makeSoup("http://ift.tt/2dGaCnV")
something(soup)
soup = makeSoup("http://ift.tt/2di8OPk")
something(soup)
soup = makeSoup("http://ift.tt/2dGbqJu")
something(soup)
<!-- begin snippet: js hide: false console: true babel: false -->

But sadly all programmers nightmare ERRORS .

Traceback (most recent call last):
  File "C:\Users\Raj\Desktop\kunal projects\Python\listing_out_all_embassies.py", line 26, in <module>
    something(soup)
  File "C:\Users\Raj\Desktop\kunal projects\Python\listing_out_all_embassies.py", line 17, in something
    next_sibling = next_sibling.nextSibling
AttributeError: 'NoneType' object has no attribute 'nextSibling'

What wrong am i doing and i am a newbie to programming as well as web-scraping . So what are some Good practices that i am not following Anyway.thanks for reading till the end.




Aucun commentaire:

Enregistrer un commentaire