samedi 28 mars 2015

How can I prevent my web crawler from slowing down over time?

I made a web crawler in C#. It starts from one URL, finds all URLs in that URL and then visits all other URLs, and so on...


I add the URLs to a string array with a pre-defined size and a Dictionary so I can check if the URL has already been crawled (I use Dictionary's ContainsKey() method because it's faster than a linear array search).


It is very fast when it starts working, but over time it gets painfully slow. The reason for this is that the Dictionary's Contains() method takes a lot of time when the Dictionary is very big (100K+ URLs, for example), and that means my web crawler is slowing down quickly over time.


What can I do about this? I have to check if a URL has been added already, and a Dictionary lookup is the fastest way, but even this way gets slow after the Dictionary gets large enough.





Aucun commentaire:

Enregistrer un commentaire