lundi 24 août 2015

python urllib2 web scrape very slow

for numb in range (50000, 100000):
                address = ('http://ift.tt/1Pum8KK') %numb
                html = urllib2.urlopen(address).read()
                regex = pattern.findall(html)
                clean = "\n".join(regex)
                text_file.write(clean)
                print numb

The script runs fine when scraping range (1,1000) but gets so slow when trying to scrape above 10000 for example the script above I tried to scrape from 50000 to 100000. what could possibly cause this? mind you that i can enter the website from my browser in less than 1/ms so its not a problem from the connection.




Aucun commentaire:

Enregistrer un commentaire