dimanche 28 mars 2021

Web scraping 40+ websites in search of opportunities in python

I have been tasked to automate the task of searching for opportunities (tenders) in 40+ websites for a company. The opportunities are usually displayed in table format. They have a title, date published, and a clickable link that takes you to a detailed description of what the opportunity is. One website example is: http://www.eib.org/en/about/procurement/index.htm

The goal would be to retrieve the new opportunities that are posted everyday and that fit specific criteria. So I need to look at specific keywords within the opportunities' title. These keywords are the fields and regions in which the company had previous experience.

My question is: After I extract these tables, with the tenders' titles, in a dataframe format, how do I search for the right opportunities and sort them by relevance (given a list of keywords)? Do I use NLP in this case and turn the words in the titles into binary code (0s and 1s)? Or are there other simpler methods I should be looking at?

Thanks in advance!




Aucun commentaire:

Enregistrer un commentaire