jeudi 21 avril 2016

How do I get the headline article link from ANY news website using JSOUP?

I am trying to have the user input any homepage of a news site and I return the link of the headline article. What I have right now is:

Document document = getDocumentFromURL(inputURL); //Connecting to the inputURL

            Elements allLinks = document.select("a");
            for(Element link: allLinks) {
                String potentialLinkToArticle = link.attr("abs:href");
                DateFormat dayFormat = new SimpleDateFormat("/yyyy/MM/dd/");
                Date day = new Date();

                if(potentialLinkToArticle.contains("2016") && potentialLinkToArticle.contains("story.html")
                        || potentialLinkToArticle.contains("index.html") && potentialLinkToArticle.contains("2016")
                        || potentialLinkToArticle.contains(dayFormat.format(day))){

                    Document article = getDocumentFromURL(potentialLinkToArticle);
                    System.out.println(article.title());
                    break;

                }
            }


public static Document getDocumentFromURL(String URL) {
    try {

        return Jsoup.connect(URL).get();

    }catch(IOException error) {
        cantFindURLLabel.setVisible(true);
        return null;

    }
}

This code only works on http://latimes.com and http://cnn.com but everything else gives me some random article link. I have been staring at links from a bunch of websites and can't seem to find a common similarity in all of them.

BTW I know this my if statement is a shitty way of searching for it but I couldn't think of anything else




Aucun commentaire:

Enregistrer un commentaire