currently I am struggling on mirroring a website using wget. Browsing the web I came out with the following command to mirror a complete website:
wget --mirror --convert-links --adjust-extension --backup-converted --page-requisites -e robots=off http://www.example.com
As expected, after running the command there is a folder called www.example.com containing all downloaded files. However, some background images are missing. Digging through the files and logs I found that wget seems to have a problem with quoted image urls.
The website uses the following css to include a background-image:
background-image: url("/path/to/image");
Collecting the pages requisites wget parses the url and tries to download the file
http://www.example.com/"/path/to/image"
which obviously fails with an error 404.
I already tried to find a solution on the web, but did not manage to find the right keywords to search for, so as a last choice I must ask you for help.
Is there any way to tell wget to ignore quotes inside urls?
Thank you very much in advance!
Aucun commentaire:
Enregistrer un commentaire