I am using java, and I want somewhat a reliable way to extract an article's publication date using its url ? I have tried:
-
Document searching with Jsoup : not reliable, since there seems to be no pre-defined way (or a number of ways) for the date information to be encoded in the html. There is even more ambiguity when parsing the date string into a Date object, since the format is unknown.
-
Getting thelast-modified header metadata through an http connection: unreliable does not work. Date retrieved is often the time the connection is made.
Similar questions:
Has anything changed ? Is there any way to accomplish it with satisfactory results?
Aucun commentaire:
Enregistrer un commentaire