lundi 6 août 2018

Selective web site analysis and possibe scraping

I would like to visit UK only web pages connected with: independent testing, construction, contracting, energy conservation etc. I wish to search web pages for 2 or 3 company names, in my case these are short and similar to, but not exactly, "Shell" or "BP". If I find one or the other or both I want to store in a .csv file with the the URL. I guess that the names will have to be " Shell " for instance with spaces added.

Urls.csv Sample file.

Shell only, BP only, URL

1,0,https://www.shell.co.uk/

If some wrote to file every 15 mins that would probably suffice. For the companies of interest I expect thousands or tens of thousands but not millions of hits. I would prefer to target web sites from the most recent 18 months.

I am not sure what this sort of exercise would be called, is it web-scraping or URL-capture or...?

I would like it to run in the background on a Windows pc (Chrome, IE) or a raspberry pi or possibly an Android 'phone. I can program esp. in R and to some extent in Python.

Please offer advice on what I need and ways to achieve this end.

Aucun commentaire:

Enregistrer un commentaire