dimanche 6 décembre 2020

Selecting filters with selenium web scrapping python

I am a newborn in Python trying to do a web scrapping project.

So far, I've been able to use Selenium to make a infinite scroll and to download data displayed. My problem, is that due to the amount of data, I need to apply a filter before scrolling down. I've been searching a lot but havent been able.

How Can I apply a select in order to filter webpage before scrolling down?

That's the web I want to scrap applyting a filter in Disciplines as Sport Climbing:

https://www.8a.nu/ascents

This is my code:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

import time


driver = webdriver.Chrome('/Users/marionaboschbertral/Desktop/chromedriver')

driver.get('https://www.8a.nu/ascents?grade=29,39#filtered')



driver.execute_script("arguments[0].scrollIntoView(true);", WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, "/html/body/div[1]/div/div/div/div[3]/div/div[2]/div[2]/div[1]/div/div/div/div[4]/div/button"))))
driver.execute_script("arguments[0].click();", WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "/html/body/div[1]/div/div/div/div[3]/div/div[2]/div[2]/div[1]/div/div/div/div[4]/div/button"))))


SCROLL_PAUSE_TIME = 3.5

# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")

while True:
    # Scroll down to bottom
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    # Wait to load page
    time.sleep(SCROLL_PAUSE_TIME)

    # Calculate new scroll height and compare with last scroll height
    new_height = driver.execute_script("return document.body.scrollHeight")
    if new_height == last_height:
        break
    last_height = new_height

elem = WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="__layout"]/div/div/div[3]/div/div[2]/div[2]/div[1]/div/div/div/div[3]/table')))
elem.click()

time.sleep(5)

dfs = pd.read_html(driver.page_source)

for df in dfs:
    if len(df) > 10:
        print (df)
driver.close()

Thanks a lot!

Aucun commentaire:

Enregistrer un commentaire