I am trying to parse a webpage (forums.macrumors.com) and get a list of all the threads posted.
So I have got this so far:
import urllib2
import re
from BeautifulSoup import BeautifulSoup
address = "http://ift.tt/1K74tbv"
website = urllib2.urlopen(address)
website_html = website.read()
text = urllib2.urlopen(address).read()
soup = BeautifulSoup(text)
Now the webpage source has this code at the start of each thread:
<li id="thread-1880" class="discussionListItem visible sticky WikiPost "
data-author="ABCD">
How do I parse this so I can then get to the thread link within this li tag? Thanks for the help.
Aucun commentaire:
Enregistrer un commentaire