dimanche 25 avril 2021

Web scraping website with BeautifulSoup and Selenium won't detect table elements in webpage

I am trying to retrieve the table containing tenders in the following website: https://wbgeconsult2.worldbank.org/wbgec/index.html#$h=1582042296662 (After clicking link, would need to click on 'Business Opportunities' at the top right to get to the table)

I tried using pandas read_html, Selenium and BeautifulSoup, all of which failed (they simply don't detect the table elements at all). I also tried to find a link in the networks tab of the dev tools, but none of them seem to work. Is this even possible? What am I doing wrong?

Here is my code:

from selenium import webdriver
from selenium.webdriver import ActionChains
import time
from bs4 import BeautifulSoup 
import pandas as pd
import requests
from requests_html import HTMLSession
session = HTMLSession()
import re

URL='https://wbgeconsult2.worldbank.org/wbgec/index.html#$h=1582042296662'

#Enter Gecko driver path
driver=webdriver.Firefox(executable_path ='/Users/****/geckodriver')

driver.get(URL)
# driver.minimize_window()

opp_path='//*[@id="menu_publicads"]/a'
list_ch=driver.find_element_by_xpath(opp_path)
ActionChains(driver).click(list_ch).perform()
time.sleep(5)

sort_xpath='//*[@id="jqgh_selection_notification.publication_date"]'
list_ch=driver.find_element_by_xpath(sort_xpath)
ActionChains(driver).click(list_ch).perform()
time.sleep(5)

sort_xpath='//*[@id="jqgh_selection_notification.publication_date"]'
list_ch=driver.find_element_by_xpath(sort_xpath)
ActionChains(driver).click(list_ch).perform()
time.sleep(5)

re=requests.get(URL)
soup=BeautifulSoup(re.content,'lxml')
row=soup.findAll('td')
print(row)


ti=driver.find_elements_by_xpath('//tr')
for t in ti:
    print(ti.text)



Aucun commentaire:

Enregistrer un commentaire