jeudi 10 mai 2018

Method / language for developing web / aspx pdf data scraper?

I am an amateur but will try to make this question specific enough to be answerable:

I am trying to build something to scrape the calendars from this court website: http://www.imperial.courts.ca.gov/CourtCalendars/Public/MCalendars.aspx The calendars are displayed as .pdfs pulled using aspx. Basically I want a program to:

  1. Identify all X labeled calendars ("criminal" or "misdemeanor").
  2. Click those links / get those .pdf files

(After that I will need to scrape the pdf into a database / spreadsheet then compare against internal calendars for missed files / wrong dates / etc.)

FYI I have done about 6 chapters of K&R, LPTHW and the JS tutorials here: https://www.w3schools.com/js/default.asp.

I am unsure whether, for the web scraping / calendar pulling part, python /django would be better or JS / related. And also unsure where to find resources directly related to this project. Any advice or links are appreciated.

Thanks!




Aucun commentaire:

Enregistrer un commentaire