web: pull links and their link text only from lines on web page and insert into a dictionary using python

mardi 31 mars 2015

pull links and their link text only from lines on web page and insert into a dictionary using python

I am trying to pull only the links and their text from a webpage line by line and insert text and link into a dictionary. Without using beautiful soup or a regex.

i keep getting this error:

error:


 Traceback (most recent call last):
 File "F:/Homework7-2.py", line 13, in <module>
 link2 = link1.split("href=")[1]
 IndexError: list index out of range

code:


import urllib.request
url = "http://www.facebook.com" 
page = urllib.request.urlopen(url)
mylinks = {}
links = page.readline().decode('utf-8')


for items in links:
  links = page.readline().decode('utf-8')
  if "a href=" in links:
     links = page.readline().decode('utf-8')
     link1 = links.split(">")[0]
     link2 = link1.split("href=")[1]
     mylinks = link2
     print(mylinks)

web

mardi 31 mars 2015

pull links and their link text only from lines on web page and insert into a dictionary using python

Aucun commentaire:

Enregistrer un commentaire