web: Python 3.3.0 Web scraping

dimanche 21 décembre 2014

Python 3.3.0 Web scraping - filtering results

So I'm new to python and very new to web scraping and could use some help. Even though I really understand the language, I have managed to scrape (ignore the pun) something together. I am trying to scrape prices from certain steam market items and this is what I have so far:


import urllib.request
import re

urls = ["http://ift.tt/1o4nPUd"]
i=0
pattern = re.compile(b'<span class="market_listing_price market_listing_price_with_fee">\s+(.+?)\s+</span>')

while i< len(urls):
    htmlfile = urllib.request.urlopen(urls[i])
    htmltext = htmlfile.read()
    titles = re.findall (pattern,htmltext)

    print (titles)
    i+=1

This gives a result like this:


[b'471,50 p&#1091;&#1073;.', b'CDN&#36; 9.50', b'Rp 103 500.99', b'&#36;8.39 USD', b'&#36;8.40 USD', b'499,99 p&#1091;&#1073;.', b'499,99 p&#1091;&#1073;.', b'6,90&#8364;', b'6,90&#8364;', b'6,90&#8364;']

As you can see, this isn't very friendly to the eye at all, what I want to get is just the price (only USD) from the cheapest item only (In this case: b'$8.39 USD'). How can I filter the results so it only gives me the lowest price from the list like this: 8.39 USD.

As I said before, I am very much new to python and web scraping, so may need a little more help with the code. Any advice would help a lot. Thanks.

Python 3.3.0

web

dimanche 21 décembre 2014

Python 3.3.0 Web scraping - filtering results

Aucun commentaire:

Enregistrer un commentaire