samedi 29 avril 2017

Computer crash, when i use recursion to check all html links and sublinks of web

i'm tasked to iterate over all links+sublinks of the given web portal. In most cases , when the web pages are not too complex and big i dont have any problems. The problem starts when i check links of a really complex site such as tutorialspoint and my computer just crash. I can't find any performance issue in code i attached, so can someone experienced tell me where in my code is a possible threat, where my computer crashes?

uniqueLinks collection is a HashSet for best perfomance for using contains.

private void recursiveLinkSearch(String webPage) {
        /** ignore pdf**/
        try {
            logger.info(webPage);
            uniqueLinks.add(webPage);
            Document doc = Jsoup.connect(webPage).get();
            doc.select("a").forEach(record->{
                String url=record.absUrl("href");
                if(!uniqueLinks.contains(url)) {
                    /** this would not allow me to to recursively acces to link from other domain **/
                    if(url.contains(getWebPortalDomain())) {
                        recursiveLinkSearch(url);
                    }
                }
            });
        } catch (IOException e) {
            e.printStackTrace();
        }

    }




Aucun commentaire:

Enregistrer un commentaire