I am trying to have the user input any homepage of a news site and I return the link of the headline article. What I have right now is:
Document document = getDocumentFromURL(inputURL); //Connecting to the inputURL
Elements allLinks = document.select("a");
for(Element link: allLinks) {
String potentialLinkToArticle = link.attr("abs:href");
DateFormat dayFormat = new SimpleDateFormat("/yyyy/MM/dd/");
Date day = new Date();
if(potentialLinkToArticle.contains("2016") && potentialLinkToArticle.contains("story.html")
|| potentialLinkToArticle.contains("index.html") && potentialLinkToArticle.contains("2016")
|| potentialLinkToArticle.contains(dayFormat.format(day))){
Document article = getDocumentFromURL(potentialLinkToArticle);
System.out.println(article.title());
break;
}
}
public static Document getDocumentFromURL(String URL) {
try {
return Jsoup.connect(URL).get();
}catch(IOException error) {
cantFindURLLabel.setVisible(true);
return null;
}
}
This code only works on http://latimes.com and http://cnn.com but everything else gives me some random article link. I have been staring at links from a bunch of websites and can't seem to find a common similarity in all of them.
BTW I know this my if statement is a shitty way of searching for it but I couldn't think of anything else
Aucun commentaire:
Enregistrer un commentaire