mardi 31 mars 2015

pull links and their link text only from lines on web page and insert into a dictionary using python

I am trying to pull only the links and their text from a webpage line by line and insert text and link into a dictionary. Without using beautiful soup or a regex.


i keep getting this error:


error:



Traceback (most recent call last):
File "F:/Homework7-2.py", line 13, in <module>
link2 = link1.split("href=")[1]
IndexError: list index out of range


code:



import urllib.request
url = "http://www.facebook.com"
page = urllib.request.urlopen(url)
mylinks = {}
links = page.readline().decode('utf-8')


for items in links:
links = page.readline().decode('utf-8')
if "a href=" in links:
links = page.readline().decode('utf-8')
link1 = links.split(">")[0]
link2 = link1.split("href=")[1]
mylinks = link2
print(mylinks)




Aucun commentaire:

Enregistrer un commentaire