I am trying to write a web scraper to take information from a database on supreme clothing called supremecommunity.com I made a post about it and it was not working, got some great help, and now it is almost working.
The code works for the most part but it starts having issues after Fall-Winter 17'
This is the error message I got in my Jupiter notebook.
UnicodeEncodeError Traceback (most recent call last) in 24 upvote = card.select_one('.progress-bar-success > span').get_text(strip=True) 25 downvote = card.select_one('.progress-bar-danger > span').get_text(strip=True) ---> 26 writer.writerow([item_name,item_image,upvote,downvote]) 27 print(item_name,item_image,upvote,downvote)
~\Anaconda3\lib\encodings\cp1252.py in encode(self, input, final) 17 class IncrementalEncoder(codecs.IncrementalEncoder): 18 def encode(self, input, final=False): ---> 19 return codecs.charmap_encode(input,self.errors,encoding_table)[0] 20 21 class IncrementalDecoder(codecs.IncrementalDecoder):
UnicodeEncodeError: 'charmap' codec can't encode character '\u0392' in position 0: character maps to
Any advice would be greatly appreciated.
import csv
import requests
from bs4 import BeautifulSoup
base = 'https://www.supremecommunity.com{}'
links = ['https://www.supremecommunity.com/season/fall-winter2011/overview/','https://www.supremecommunity.com/season/spring-summer2012/overview/','https://www.supremecommunity.com/season/fall-winter2012/overview/',
'https://www.supremecommunity.com/season/spring-summer2013/overview/','https://www.supremecommunity.com/season/fall-winter2013/overview/','https://www.supremecommunity.com/season/spring-summer2014/overview/',
'https://www.supremecommunity.com/season/fall-winter2014/overview/','https://www.supremecommunity.com/season/spring-summer2015/overview/','https://www.supremecommunity.com/season/fall-winter2015/overview/',
'https://www.supremecommunity.com/season/spring-summer2016/overview/','https://www.supremecommunity.com/season/fall-winter2016/overview/','https://www.supremecommunity.com/season/spring-summer2017/overview/',
'https://www.supremecommunity.com/season/fall-winter2017/overview/', 'https://www.supremecommunity.com/season/spring-summer2018/overview/','https://www.supremecommunity.com/season/fall-winter2018/overview/',
'https://www.supremecommunity.com/season/spring-summer2019/overview/','https://www.supremecommunity.com/season/fall-winter2019/overview/']
with open("supremecommunity.csv","w",newline="") as f:
writer = csv.writer(f)
writer.writerow(['item_name','item_image','upvote','downvote'])
for link in links:
r = requests.get(link,headers={"User-Agent":"Mozilla/5.0"})
soup = BeautifulSoup(r.text,"lxml")
for card in soup.select('[class$="d-card"]'):
item_name = card.select_one('.card__top')['data-itemname']
item_image = base.format(card.select_one('img.prefill-img').get('data-src'))
upvote = card.select_one('.progress-bar-success > span').get_text(strip=True)
downvote = card.select_one('.progress-bar-danger > span').get_text(strip=True)
writer.writerow([item_name,item_image,upvote,downvote])
print(item_name,item_image,upvote,downvote)
Aucun commentaire:
Enregistrer un commentaire