web: wget Script Issues

dimanche 1 mars 2015

wget Script Issues

I am new to web crawling. I borrowed the code below from this SO question: Downloadable HTML Test Corpus. It works perfectly on stackoverflow.com. However, when I try it on yelp.com or http://ift.tt/gEh0M8, it only returns a few results.


wget -t 7 -w 5 --waitretry=14 --random-wait -l 2 -m -k -K -e robots=off http://ift.tt/gbk8l4 -o ./myLog.log

What should I change so that it returns more results, within the domains?

web

dimanche 1 mars 2015

wget Script Issues

Aucun commentaire:

Enregistrer un commentaire