mercredi 29 juillet 2020

General web scraper for multiple use cases

For a project, I'm currently trying to scrape Staff data (name, position, email, and such) from US public school websites simply by accessing the school district (SD) website. That is, I land on an SD webpage, find subsidiary schools, and then scrape staff details from the school page.

Unfortunately, most school pages have varied navigation to their staff page, but it would be nice to have 1-2 versions of web scrapers that could work for the majority of schools. Here are some school districts that have been challenging to scrape.

  1. Beaverton School District - https://www.beaverton.k12.or.us/
  2. Camas School District - http://www.camas.wednet.edu/
  3. Shoreline School District - https://www.shorelineschools.org/
  4. Everett School District - https://www.everettsd.org/
  5. Hesperia School District - https://www.hesperiausd.org/

Any/all guidance on how to approach this problem will be much appreciated! Thank you immensely




Aucun commentaire:

Enregistrer un commentaire