mercredi 27 janvier 2016

HRemoving Duplicate Tag Content Using BeautifulSoup

I made a Script for Getting every H1 Tag from all 76 pages of a website. But in this process my Program copy a very specific line "Current Affairs January 2015" as this line is present in every page. Can I edit the Code to just print it 1 time ?

Here's my code:

from bs4 import BeautifulSoup as bs
import urllib


for i in range(2,77):
    url1="http://ift.tt/1UpFh3f"+"page/"+str(i)
    soup = bs(urllib.urlopen(url1))
    for link in soup.findAll('h1'):
        print link.string

Here is the Screenshot of the output

Thanks in advance.




Aucun commentaire:

Enregistrer un commentaire