In this code I am scraping a website for all "a" tags inside all "div class='image'" found on the page and print out the contents of each "a" tag inside all the "image" classes on the page.
from bs4 import BeautifulSoup
import os
#edible mushroom scraping
url = 'http://www.mushroom.world/mushrooms/edible?page=0'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
#find all image classes
images = soup.find('div', attrs={'class': 'image'})
#images = soup.select('div.class image')
#find all images within class
for link in images.findAll("a"):
#get url ending from image a href
image_url= (link.get('href'))
#creates usable url
image_url = image_url.replace('/../', 'https://www.mushroom.world/')
print(image_url)
I believe the issue with the code is around the:
#find all image classes
images = soup.find('div', attrs={'class': 'image'})
when using soup.find, images is set to the first div of class images, and the rest of the code successfully retrieves the internal "a" tag found inside the first "image" class, however, when I set the code to:
#find all image classes
images = soup.find_all('div', attrs={'class': 'image'})
in order to go through all "image" class divs, then the code gives the error:
Exception has occurred: AttributeError
ResultSet object has no attribute 'findAll'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?
Aucun commentaire:
Enregistrer un commentaire