mercredi 12 mai 2021

Web scrabbing google search results without getting detected

I have about 100000 links to copy from a google search result. I wrote a python code using selenium to make my life easier but as soon as i arrive to around 150 results the google search result page changes, as i think that google is detecting my scrabber. Here is a sample of my code:

links = []

# Extracting the 10 links in the google page
soup = BeautifulSoup(driver.page_source, 'html.parser')
sleep(randTime(0,1))
search = soup.find_all('div', class_="yuRUbf")
for h in search:
    links.append(h.a.get('href'))
sleep(randTime(0,1))

#Clicking the next button
next_button = driver.find_element_by_xpath("//a[@id='pnnext']") 
sleep(randTime(0,5))
next_button.click()
sleep(randTime(2,10))

I would really appreciate any help from you, I'm using random sleeps during my code execution put they are not working. Do you suggest anything to help?




Aucun commentaire:

Enregistrer un commentaire