I am trying to get the first hundred results from a web page(but getting only the first 20 results instead): https://www.usnews.com/education/best-high-schools/search?national-rank-range-min=1&national-rank-range-max=100
Used the following code:
from selenium import webdriver
from bs4 import BeautifulSoup
import time
url = "https://www.usnews.com/education/best-high-schools/search?national-rank-range-min=1&national-rank-range-max=100"
driver = webdriver.Chrome()
driver.get(url)
time.sleep(10)
scroll_pause_time = 1
screen_height = driver.execute_script("return window.screen.height;")
i = 1
print(screen_height)
while True:
driver.execute_script("window.scrollTo(0, {screen_height}*{i});".format(screen_height=screen_height, i=i))
i += 1
time.sleep(scroll_pause_time)
# update scroll height each time after scrolled, as the scroll height can change after we scrolled the page
scroll_height = driver.execute_script("return document.body.scrollHeight;")
# Break the loop when the height we need to scroll to is larger than the total scroll height
if (screen_height) * i > scroll_height:
break
while True:
try:
loadmore = driver.find_element_by_id("pager__ButtonContentContainer-sc-1i8e93j-3 zIUhv")
loadmore.click()
except:
print("Reached bottom of page")
break
html_source = driver.page_source
soup = BeautifulSoup(html_source,'html.parser')
...
I tried different ways but nothing is loading the page fully through automation. Even the view source shows the first 20 results only. I am looking to get the first 100 results instead.
Aucun commentaire:
Enregistrer un commentaire