jeudi 3 décembre 2020

Python - Web Scraper - Not Picking up Price

I have coded a (very basic) web scraper to scrape products from sam's club website and then print out the product name and product price.

the problem is that python prints out the same price (the price of the first product on the page) to every single other item (even though the name changes accordingly).

if I change the page to be scraped, the price changes to the first price on that page and then marks that as the price for every single other item.

I dont understand why everything else is working and not the product price?

(sidenote: price variable looks ineleganrt and confusing because sam's clube breaks down their price into 3 fields on the server side. price = $, price 2 = dollars, price 3 = cents)

Thanks for your help, code is below:

import requests, bs4
from bs4 import BeautifulSoup

#makes each request look like a human request
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36',
        'X-Requested-With': 'XMLHttpRequest',
        'Accept': 'application/json, text/javascript, */*; q=0.01',
        'Cookie': 'localeEditionShown_en=true; permutive-session=^%^7B^%^22session_id^%^22^%^3A^%^22e5386dfb-c58a-4882-b0e1-2cc2d9518982^%^22^%^2C^%^22last_updated^%^22^%^3A^%^222017-11-22T19^%^3A10^%^3A04.522Z^%^22^%^7D; visid_incap_774904=4xMirl1lRNOgrnN+Sm9S1zNx61kAAAAAREIPAAAAAACAsmaAAbBYMBjQTCqLf/D6wOVO4hdnKjIF; incap_ses_151_774904=/LX+SNRqsR8SzJi7p3YYAjKgGloAAAAApdQygw8VYBxbv/wvl7Be7A==; _gat=1; _gat_subdomainTracker=1; _ga=GA1.2.1522498341.1508602188; _gid=GA1.2.1243543827.1511694421'
        }

#defines url and requests/beautifulsoup variables
url = "https://www.samsclub.com/s/gatorade"
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.content, 'lxml')
productlist = soup.find_all('div',class_='sc-pc-title-medium')


for products in productlist:
    name = soup.find('div',class_='sc-pc-title-medium').text.strip()
    price = soup.find('span', class_='Price-currency').text.strip()
    price2 = soup.find('span', class_='Price-characteristic').text.strip()
    price3 = soup.find('span', class_='Price-mantissa').text.strip()
    productprice = price + price2 + '.' + price3 #need to find out why its not updating
    
    results = {
        'product name': products.text.strip(),
        'product price': productprice
    }

    
    print(results) 



Aucun commentaire:

Enregistrer un commentaire