samedi 25 mai 2019

How to remove HTML Tags and '\n' from Scrapy Python Output

I'm new to Python and Web Scraping. I wrote below 2 lines to extract title and price from website. However it gives output with html tags and '\n' characters. How can I remove them and get only text output?

product_name = response.css('#productTitle::text')[0].extract().strip('\n')
product_price = response.css('#priceblock_ourprice')[0].extract().strip()


Output

[
    "                \n                    \n                    \n                \n\n                \n                    \n                    \n                        Stainless Steel Food Grinder Attachment fit KitchenAid Stand Mixers Including Sausage Stuffer, Dishwasher Safe,Durable Mixer Accessories as Meat Processor\n                    \n                \n\n                \n                    \n                    \n                \n            ",
    "<span id=\"priceblock_ourprice\" class=\"a-size-medium a-color-price priceBlockBuyingPriceString\">$87.99</span>"
]




Aucun commentaire:

Enregistrer un commentaire