I am following along a tutorial on how to use Beautiful Soup to program a web scraper.
https://youtu.be/XQgXKtPSzUI?t=1229 Here is the tutorial with a timestamp to my roadblock.
All was going well, I managed to get the brand name and save it to a variable.
However when It came to getting the item name, I don't know If I diverged from the tutorial or if the structure of the site has changed but I can't do it.
Here is my code
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as Soup
my_url = 'https://www.newegg.com/global/uk-en/Desktop-Graphics-Cards/SubCategory/ID-48?nm_mc=KNC-GoogleukAdwords&cm_mmc=KNC-GoogleukAdwords-_-Sitelink-UK-_-VGA-Cards-_-Global&gclid=CjwKCAjwv4_1BRAhEiwAtMDLsjTOkmeuVkXvw4LI45DrrqAEHdpSjqAgYEhh48TO-7kGQiAe0x5VPBoCBYQQAvD_BwE'
#Opening conection, grabbing page
uClient = uReq(my_url)
#offloads contents into variable
page_html = uClient.read()
#closes connection
uClient.close()
#html parsing
page_soup = Soup(page_html, "html.parser")
#grabs each product
containers = page_soup.findAll("div", {"class": "item-container"})
divWithInfo = containers[0].find("a","item-title")
If I where to print the contents of divWithInfo I would get: <a class="item-title" href="https://www.newegg.com/global/uk-en/gigabyte-radeon-rx-570-gv-rx570gaming-4gd-rev2-0/p/N82E16814932242" title="View Details">GIGABYTE Radeon RX 570 DirectX 12 GV-RX570GAMING-4GD REV2.0 4GB 256-Bit GDDR5 PCI Express 3.0 x16 ATX Video Card</a>
That is as far as I can get. I read that and assumed I needed to search for the title attribute inside that tag. However I don't know how to print the contents of the title attribute it to a variable.
The end result would be being able to print just the item name so: "GIGABYTE Radeon RX 570 DirectX 12 GV-RX570GAMING-4GD REV2.0 4GB 256-Bit GDDR5 PCI Express 3.0 x16 ATX Video Card"
I am fairly new to all this so any help would be appreciated, if anything needs clarifying please let me know.
Aucun commentaire:
Enregistrer un commentaire