vendredi 26 juin 2015

How do I get numerical data while web scraping?

I'm completely new to web scraping, so any reference sites would be great. I am slightly confused as to how I'm getting the actual data. When I print(theText), I get a bunch of html code (which should be correct). How do I exactly go about getting values from this? Do I have to use regular expressions to get the actual numerical data?

def getData():
    request = urllib.request.Request("http://ift.tt/1HliMuc")
    response = urllib.request.urlopen(request)
    the_page = response.read()
    theText = the_page.decode()
    print(theText)




Aucun commentaire:

Enregistrer un commentaire