lundi 10 août 2015

parsing web pages for internal links recursively

I want to parse any webpage to find out the links in it, and then recursively parse the links so as to generate a list/tree of the links. Is it possible? If yes, then advise on the method/tools/approach would be really very helpful. Thank You.

[Small Description of the Problem : Suppose there is a website, say "www.xyz.com", which contains several links, such as, "/about","home","contact us" etc. On clicking any of these link, the new webpage in turn can contain various other links.]

Aucun commentaire:

Enregistrer un commentaire