My job is to extract informations from all festival websites of my country.
Informations as postal address, the city or main topic of the festival as cinema, music, danse and so on.
But all websites don't have the same html structure, I mean they don't have same html tags.
So datas that I am looking for are mainly in the text content of the page, and the datas can not be so easy to find because this is not clearly mentionned on every website like "adresse : 10 street of new york , New York". Sometimes there is no postal address on the website, or sometimes they mentionned several city so I can extract the wrong city.
I thought about using regex, or find a solution to send a global request to google and get datas from others website? but there is any other "clean" solution or easy one with nodejs?
How much time you think it will take to do this ?
Thank you a lot guys!
Aucun commentaire:
Enregistrer un commentaire