dimanche 21 décembre 2014

Python 3.3.0 Web scraping - filtering results

So I'm new to python and very new to web scraping and could use some help. Even though I really understand the language, I have managed to scrape (ignore the pun) something together. I am trying to scrape prices from certain steam market items and this is what I have so far:



import urllib.request
import re

urls = ["http://ift.tt/1o4nPUd"]
i=0
pattern = re.compile(b'<span class="market_listing_price market_listing_price_with_fee">\s+(.+?)\s+</span>')

while i< len(urls):
htmlfile = urllib.request.urlopen(urls[i])
htmltext = htmlfile.read()
titles = re.findall (pattern,htmltext)

print (titles)
i+=1


This gives a result like this:



[b'471,50 p&#1091;&#1073;.', b'CDN&#36; 9.50', b'Rp 103 500.99', b'&#36;8.39 USD', b'&#36;8.40 USD', b'499,99 p&#1091;&#1073;.', b'499,99 p&#1091;&#1073;.', b'6,90&#8364;', b'6,90&#8364;', b'6,90&#8364;']


As you can see, this isn't very friendly to the eye at all, what I want to get is just the price (only USD) from the cheapest item only (In this case: b'&#36;8.39 USD'). How can I filter the results so it only gives me the lowest price from the list like this: 8.39 USD.


As I said before, I am very much new to python and web scraping, so may need a little more help with the code. Any advice would help a lot. Thanks.


Python 3.3.0





Aucun commentaire:

Enregistrer un commentaire