I'm searching for an open sourced Java web crawler which can crawl a webpage(and child pages) looking for a specific HTML tag and CSS class (i.e. h2 tag and class "test"), I want to obtain the text in this h2 tag and the url crawled from a web page.
Any idea on of the existing tools which can be used for this purpose?
Aucun commentaire:
Enregistrer un commentaire