mercredi 28 janvier 2015

Web crawler class not working

Recently, I began working on constructing a simple web crawler. My initial code that just iterated twice worked perfectly, but when I attempted to turn it into a class with error exception handling, it no longer compiled.



import re, urllib
class WebCrawler:
"""A Simple Web Crawler That Is Readily Extensible"""
def __init__():
size = 1
def containsAny(seq, aset):
for c in seq:
if c in aset: return True
return False

def crawlUrls(url, depth):
textfile = file('UrlMap.txt', 'wt')
urlList = [url]
size = 1
for i in range(depth):
for ee in range(size):
if containsAny(urlList[ee], "http://"):
try:
webpage = urllib.urlopen(urlList[ee]).read()
break
except:
print "Following URL failed!"
print urlList[ee]
for ee in re.findall('''href=["'](.[^"']+)["']''',webpage, re.I):
print ee
urlList.append(ee)
size+=1
textfile.write(ee+'\n')

myCrawler = WebCrawler

myCrawler.crawlUrls("http://ift.tt/1Likcor", 2)


And here is the error code generated.



Traceback (most recent call last):
File "C:/Users/Noah Huber-Feely/Desktop/Python/WebCrawlerClass", line 33, in <module>
myCrawler.crawlUrls("http://ift.tt/1Likcor", 2)
TypeError: unbound method crawlUrls() must be called with WebCrawler instance as first argument (got str instance instead)




Aucun commentaire:

Enregistrer un commentaire