web: How to remove HTML Tags and '\n' from Scrapy Python Output

samedi 25 mai 2019

How to remove HTML Tags and '\n' from Scrapy Python Output

I'm new to Python and Web Scraping. I wrote below 2 lines to extract title and price from website. However it gives output with html tags and '\n' characters. How can I remove them and get only text output?

product_name = response.css('#productTitle::text')[0].extract().strip('\n')
product_price = response.css('#priceblock_ourprice')[0].extract().strip()

Output

[
    "                \n                    \n                    \n                \n\n                \n                    \n                    \n                        Stainless Steel Food Grinder Attachment fit KitchenAid Stand Mixers Including Sausage Stuffer, Dishwasher Safe,Durable Mixer Accessories as Meat Processor\n                    \n                \n\n                \n                    \n                    \n                \n            ",
    "<span id=\"priceblock_ourprice\" class=\"a-size-medium a-color-price priceBlockBuyingPriceString\">$87.99</span>"
]

web

samedi 25 mai 2019

How to remove HTML Tags and '\n' from Scrapy Python Output

Aucun commentaire:

Enregistrer un commentaire