mercredi 28 juin 2017

selenium, webdriver.page_source not refreshing after click

I am trying to copy a web page's list of addresses for a given community service to a new document so i can geocode all of the locations in a map. Instead of being able to get a list of all the parcels I can only download one at a time and there are 25 parcel numbers limited to a page. As such, this would be extremely time consuming.

I want to develop a script that will look at the page source (everything including the 25 addresses which are contained in a table tag) click the next page button, copy the next page, and so on until the max page is reached. Afterwards, I can format the text to be geocoding compatible.

The code below does all of this except it only copies the first page over and over again even though I can clearly see that the program has successfully navigated to the next page:

# Open chrome
br = webdriver.Chrome()

pg_src = br.page_source.encode("utf") 
soup = BeautifulSoup(pg_src)

max_page = 122 #int(max_page)

#open a text doc to write the results to

f = open(r'C:\Geocoding\results.txt', 'w')

# write results page by page until max page number is reached

pg_cnt = 1 # start on 1 as we should already have the first page
while pg_cnt < max_page:
    tble_elems = soup.findAll('table')
    soup = BeautifulSoup(str(tble_elems))
    f.write(str(soup))
    time.sleep(5)
    pg_cnt +=1
    # clicks the next button
    br.find_element_by_xpath("//div[@class='next button']").click()
    # give some time for the page to load
    time.sleep(5)
    # get the new page source (THIS IS THE PART THAT DOESN'T SEEM TO BE WORKING)
    page_src = br.page_source.encode("utf")
    soup = BeautifulSoup(pg_src)

f.close()




Aucun commentaire:

Enregistrer un commentaire