mardi 19 janvier 2016

Parsing and Returning HTML data with Python

I am using Python to retrieve HTML from a webpage and then parsing it in the MyHtmlParer class. If I find certain data in the HTML, I want to return it to the main method. I have printed the data results while still in the MyHtmlParser class so I know it is finding what I want, but I do not know how to return the data to my main method.

import urllib2
from MyHtmlParser import MyHtmlParser


def HtmlRetrieve(url):
    req = urllib2.Request(url, headers={'User-Agent': "Magic Browser"})
    con = urllib2.urlopen(req)
    return con.read()


def main():
    url = "someUrl.com"

    html = HtmlRetrieve(url)

    parser = MyHtmlParser()
    parser.feed(html)
    print parser.links

main()

Then this is my MyHtmlParser Class

from HTMLParser import HTMLParser


class MyHtmlParser(HTMLParser):
    def __init__(self):
        HTMLParser.__init__()
        self.links = []

    def handle_data(self, data):
        if data == "some text":
            self.links.append(data)

Why is the data not being returned to my main method?




Aucun commentaire:

Enregistrer un commentaire