I am new to web crawling. I borrowed the code below from this SO question: Downloadable HTML Test Corpus. It works perfectly on stackoverflow.com. However, when I try it on yelp.com or http://ift.tt/gEh0M8, it only returns a few results.
wget -t 7 -w 5 --waitretry=14 --random-wait -l 2 -m -k -K -e robots=off http://ift.tt/gbk8l4 -o ./myLog.log
What should I change so that it returns more results, within the domains?
Aucun commentaire:
Enregistrer un commentaire