I would like to visit UK only web pages connected with: independent testing, construction, contracting, energy conservation etc. I wish to search web pages for 2 or 3 company names, in my case these are short and similar to, but not exactly, "Shell" or "BP". If I find one or the other or both I want to store in a .csv file with the the URL. I guess that the names will have to be " Shell " for instance with spaces added.
Urls.csv Sample file.
Shell only, BP only, URL
If some wrote to file every 15 mins that would probably suffice. For the companies of interest I expect thousands or tens of thousands but not millions of hits. I would prefer to target web sites from the most recent 18 months.
I am not sure what this sort of exercise would be called, is it web-scraping or URL-capture or...?
I would like it to run in the background on a Windows pc (Chrome, IE) or a raspberry pi or possibly an Android 'phone. I can program esp. in R and to some extent in Python.
Please offer advice on what I need and ways to achieve this end.
Aucun commentaire:
Enregistrer un commentaire