mardi 21 juin 2016

Website scraping script works in Linux but not in Windows 7?

I have written a script that scrapes a URL. It works fine on Linux OS. But i am getting http 503 error when running on Windows 7. The URL has some issue. I am using python 2.7.11 . Please help. Below is the script:

import sys # Used to add the BeautifulSoup folder the import path
import urllib2 # Used to read the html document

if __name__ == "__main__":
    ### Import Beautiful Soup
    ### Here, I have the BeautifulSoup folder in the level of this Python script
    ### So I need to tell Python where to look.
    sys.path.append("./BeautifulSoup")
    from bs4 import BeautifulSoup

    ### Create opener with Google-friendly user agent
    opener = urllib2.build_opener()
    opener.addheaders = [('User-agent', 'Mozilla/5.0')]

    ### Open page & generate soup
    ### the "start" variable will be used to iterate through 10 pages.
    for start in range(0,1000):
        url = "http://ift.tt/28Kyste" + str(start*10)
        page = opener.open(url)
        soup = BeautifulSoup(page)

        ### Parse and find
        ### Looks like google contains URLs in <cite> tags.
        ### So for each cite tag on each page (10), print its contents (url)
   file = open("parseddata.txt", "wb")
    for cite in soup.findAll('cite'):
                print cite.text
                file.write(cite.text+"\n")
                # file.flush()
                # file.close()

In case you run it in windows 7, the cmd throws http503 error stating the issue is with url. The URL works fine in Linux OS. In case URL is actually wrong please suggest the alternatives.




Aucun commentaire:

Enregistrer un commentaire