I'm making a web scraper using python3.
and I need to get the contact info from http://badboyreport.kr/ from http://badboyreport.kr/page/1 to http://badboyreport.kr/page/200
But, for some reason(I don't know exactly what it is..),
It does not working and I can only see the error message picture(CAPTCHA) like the link here.
Current code:
import requests
from bs4 import BeautifulSoup
url = 'http://badboyreport.kr/page/200'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.1', 'Accept-language':'fr',
'Cookie':'_ga=GA1.2.245133617.1518492307;_gid=GA1.2.501986226.1518492307;PHPSESSID=44a7kmejhl9lvo4r8cvjg196r5;_gat=1',
'Host':'badboyreport.kr',
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8'
}
res=requests.get(url, headers= headers)
soup=BeautifulSoup(res.content,"lxml")
a=soup.title
print(soup)
What is exactly the problem here?
Is it because of the blocking from this website or IP address Issue?
Please let me know if there is any other way that I can get contact info from this website(not manually one by one)
Aucun commentaire:
Enregistrer un commentaire