lundi 22 juin 2015

parsing webpage using python

I am trying to parse a webpage (forums.macrumors.com) and get a list of all the threads posted.

So I have got this so far:

import urllib2 
import re

from BeautifulSoup import BeautifulSoup
address = "http://ift.tt/1K74tbv"                         
website = urllib2.urlopen(address) 
website_html = website.read() 
text = urllib2.urlopen(address).read()
soup = BeautifulSoup(text)

Now the webpage source has this code at the start of each thread:

<li id="thread-1880" class="discussionListItem visible sticky WikiPost  "   
data-author="ABCD">

How do I parse this so I can then get to the thread link within this li tag? Thanks for the help.




Aucun commentaire:

Enregistrer un commentaire