I am a newborn in Python trying to do a web scrapping project.
So far, I've been able to use Selenium to make a infinite scroll and to download data displayed. My problem, is that due to the amount of data, I need to apply a filter before scrolling down. I've been searching a lot but havent been able.
How Can I apply a select in order to filter webpage before scrolling down?
That's the web I want to scrap applyting a filter in Disciplines as Sport Climbing:
This is my code:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
driver = webdriver.Chrome('/Users/marionaboschbertral/Desktop/chromedriver')
driver.get('https://www.8a.nu/ascents?grade=29,39#filtered')
driver.execute_script("arguments[0].scrollIntoView(true);", WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, "/html/body/div[1]/div/div/div/div[3]/div/div[2]/div[2]/div[1]/div/div/div/div[4]/div/button"))))
driver.execute_script("arguments[0].click();", WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "/html/body/div[1]/div/div/div/div[3]/div/div[2]/div[2]/div[1]/div/div/div/div[4]/div/button"))))
SCROLL_PAUSE_TIME = 3.5
# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
# Scroll down to bottom
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# Wait to load page
time.sleep(SCROLL_PAUSE_TIME)
# Calculate new scroll height and compare with last scroll height
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
elem = WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="__layout"]/div/div/div[3]/div/div[2]/div[2]/div[1]/div/div/div/div[3]/table')))
elem.click()
time.sleep(5)
dfs = pd.read_html(driver.page_source)
for df in dfs:
if len(df) > 10:
print (df)
driver.close()
Thanks a lot!
Aucun commentaire:
Enregistrer un commentaire