jeudi 24 novembre 2016

Get article published date from its url

I am using java, and I want somewhat a reliable way to extract an article's publication date using its url ? I have tried:

  • Document searching with Jsoup : not reliable, since there seems to be no pre-defined way (or a number of ways) for the date information to be encoded in the html. There is even more ambiguity when parsing the date string into a Date object, since the format is unknown.

  • Getting thelast-modified header metadata through an http connection: unreliable does not work. Date retrieved is often the time the connection is made.

Similar questions:

5 years back

2 years back

Has anything changed ? Is there any way to accomplish it with satisfactory results?




Aucun commentaire:

Enregistrer un commentaire