samedi 25 avril 2020

Web Scrapping the contents of a tittle attribute

I am following along a tutorial on how to use Beautiful Soup to program a web scraper.

https://youtu.be/XQgXKtPSzUI?t=1229 Here is the tutorial with a timestamp to my roadblock.

All was going well, I managed to get the brand name and save it to a variable.

However when It came to getting the item name, I don't know If I diverged from the tutorial or if the structure of the site has changed but I can't do it.

Here is my code




from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as Soup 

my_url = 'https://www.newegg.com/global/uk-en/Desktop-Graphics-Cards/SubCategory/ID-48?nm_mc=KNC-GoogleukAdwords&cm_mmc=KNC-GoogleukAdwords-_-Sitelink-UK-_-VGA-Cards-_-Global&gclid=CjwKCAjwv4_1BRAhEiwAtMDLsjTOkmeuVkXvw4LI45DrrqAEHdpSjqAgYEhh48TO-7kGQiAe0x5VPBoCBYQQAvD_BwE'


#Opening conection, grabbing page
uClient = uReq(my_url)

#offloads contents into variable
page_html = uClient.read()

#closes connection
uClient.close()

#html parsing
page_soup = Soup(page_html, "html.parser")

#grabs each product
containers = page_soup.findAll("div", {"class": "item-container"})




divWithInfo = containers[0].find("a","item-title")

If I where to print the contents of divWithInfo I would get: <a class="item-title" href="https://www.newegg.com/global/uk-en/gigabyte-radeon-rx-570-gv-rx570gaming-4gd-rev2-0/p/N82E16814932242" title="View Details">GIGABYTE Radeon RX 570 DirectX 12 GV-RX570GAMING-4GD REV2.0 4GB 256-Bit GDDR5 PCI Express 3.0 x16 ATX Video Card</a>

That is as far as I can get. I read that and assumed I needed to search for the title attribute inside that tag. However I don't know how to print the contents of the title attribute it to a variable.

The end result would be being able to print just the item name so: "GIGABYTE Radeon RX 570 DirectX 12 GV-RX570GAMING-4GD REV2.0 4GB 256-Bit GDDR5 PCI Express 3.0 x16 ATX Video Card"

I am fairly new to all this so any help would be appreciated, if anything needs clarifying please let me know.




Aucun commentaire:

Enregistrer un commentaire