-
The ServerCrawler will start with a list of URLs (seed list)
-
When a ClientCrawler connects to the ServerCrawler, the ServerCrawler will send the oldest 10 urls in the list to the client.
-
The ClientCrawler will download the pages of these urls and send them to IndexServer.
-
The ClientCrawler will find all urls in the downloaded pages and then it will send them to the ServerCrawler which will add them to the list.
-
The client will send “more” to receive more URLs to be crawler or “close” to close connection.
-
ServerCrawler, and IndexServer must be multithreaded.
-
ServerCrawler to ClinetCrawler must use TCP connection
-
ClientCrawler to IndexServer must use UDP connection
mardi 26 mai 2015
Create an efficient webcrawler(java), which has three parts. ServerCrawler, IndexServer and ClientCrawler as follows
Inscription à :
Publier les commentaires (Atom)
Aucun commentaire:
Enregistrer un commentaire