I have files containing HTML and I am trying to parse that file and then tokenise the text of the body. I achieve this through:
docs = JSOUP.parse("myFile","UTF-8","");
System.out.println(docs.boy().text());
The above codes work fine but the problem is TEXT that is present outside of html tags without any tag is also printed as part of the body tags. I need to find a way to stop this text outside of HTML tags from being read Help this is a time sensitive question !
Aucun commentaire:
Enregistrer un commentaire