lundi 19 février 2018

making a web scraper(or web crawler) using python. and In case it's not working

I'm making a web scraper using python3.
and I need to get the contact info from http://badboyreport.kr/ from http://badboyreport.kr/page/1 to http://badboyreport.kr/page/200
But, for some reason(I don't know exactly what it is..),
It does not working and I can only see the error message picture(CAPTCHA) like the link here.

Current code:

import requests
from bs4 import BeautifulSoup

url = 'http://badboyreport.kr/page/200'
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.1', 'Accept-language':'fr',
    'Cookie':'_ga=GA1.2.245133617.1518492307;_gid=GA1.2.501986226.1518492307;PHPSESSID=44a7kmejhl9lvo4r8cvjg196r5;_gat=1',
    'Host':'badboyreport.kr',
    'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8'
}

res=requests.get(url, headers= headers)
soup=BeautifulSoup(res.content,"lxml")

a=soup.title
print(soup)

What is exactly the problem here?

Is it because of the blocking from this website or IP address Issue?

Please let me know if there is any other way that I can get contact info from this website(not manually one by one)




Aucun commentaire:

Enregistrer un commentaire