dimanche 1 mars 2015

wget Script Issues

I am new to web crawling. I borrowed the code below from this SO question: Downloadable HTML Test Corpus. It works perfectly on stackoverflow.com. However, when I try it on yelp.com or http://ift.tt/gEh0M8, it only returns a few results.



wget -t 7 -w 5 --waitretry=14 --random-wait -l 2 -m -k -K -e robots=off http://ift.tt/gbk8l4 -o ./myLog.log


What should I change so that it returns more results, within the domains?





Aucun commentaire:

Enregistrer un commentaire