jeudi 27 septembre 2018

How to parse a file containing html using JSOUP?

I have files containing HTML and I am trying to parse that file and then tokenise the text of the body. I achieve this through:

     docs = JSOUP.parse("myFile","UTF-8","");
      System.out.println(docs.boy().text()); 

The above codes work fine but the problem is TEXT that is present outside of html tags without any tag is also printed as part of the body tags. I need to find a way to stop this text outside of HTML tags from being read Help this is a time sensitive question !




Aucun commentaire:

Enregistrer un commentaire