mercredi 26 août 2020

Web scraping different formats

I'm attempting to scrape the names, email IDs (linked to their name), years, and roles/subjects of the staff on this page https://www.aacps.org/Page/4014 and save the details in an excel sheet. Running into difficulties collecting link data as well as plaintext information.

This is what I have so far:

    url_fac='https://www.aacps.org/Page/4014'
    print(url_fac)
    req_fac = urllib.request.Request(url_fac, headers=hdr)
    html_page_fac = urllib.request.urlopen(req_fac)
    soup_fac = BeautifulSoup(html_page_fac, "lxml")
    links_fac=soup_fac.find_all('a') 

Thanks so much in advance!




Aucun commentaire:

Enregistrer un commentaire