mardi 26 mai 2015

Create an efficient webcrawler(java), which has three parts. ServerCrawler, IndexServer and ClientCrawler as follows

  1. The ServerCrawler will start with a list of URLs (seed list)

  2. When a ClientCrawler connects to the ServerCrawler, the ServerCrawler will send the oldest 10 urls in the list to the client.

  3. The ClientCrawler will download the pages of these urls and send them to IndexServer.

  4. The ClientCrawler will find all urls in the downloaded pages and then it will send them to the ServerCrawler which will add them to the list.

  5. The client will send “more” to receive more URLs to be crawler or “close” to close connection.

  6. ServerCrawler, and IndexServer must be multithreaded.

  7. ServerCrawler to ClinetCrawler must use TCP connection

  8. ClientCrawler to IndexServer must use UDP connection




Aucun commentaire:

Enregistrer un commentaire