web: How do I filter a list through a formatted web scraping for loop

lundi 28 juin 2021

How do I filter a list through a formatted web scraping for loop

I have a list of basketball players that I want to pass through a web scraping for loop I've already set up. The list of players is a list of the 2011 NBA Draft picks. I want to loop through each player and get their college stats from their final year in college. The problem is some drafted players did not go to college and therefore do not have a url formatted in their name so every time I pass in even one player that did not play in college the whole code gets an error. I have tried including "pass" and "continue" but nothing seems to work. This is the closest I gotten so far:

from bs4 import BeautifulSoup
import requests 
import pandas as pd 
headers = {'User Agent':'Mozilla/5.0'}

players = [
   'kyrie-irving','derrick-williams','enes-kanter',
   'tristan-thompson','jonas-valanciunas','jan-vesely',
   'bismack-biyombo','brandon-knight','kemba-walker,
   'jimmer-fredette','klay-thompson'
]
#the full list of players goes on for a total of 60 players, this is just the first handful

player_stats = []

 for player in players:
    url = (f'https://www.sports-reference.com/cbb/players/{player}-1.html')
    res = requests.get(url)
    #if player in url:
        #continue
    #else:
        #print("This player has no college stats")
#Including this if else statement makes the error say header is not defined. When not included, the error says NoneType object is not iterable       
    soup = BeautifulSoup(res.content, 'lxml')
    header = [th.getText() for th in soup.findAll('tr', limit = 2)[0].findAll('th')]
    rows = soup.findAll('tr')
    player_stats.append([td.getText() for td in soup.find('tr', id ='players_per_game.2011')])
    player_stats

graph = pd.DataFrame(player_stats, columns = header)

web

lundi 28 juin 2021

How do I filter a list through a formatted web scraping for loop

Aucun commentaire:

Enregistrer un commentaire