lundi 8 janvier 2018

Wget and quoted url

currently I am struggling on mirroring a website using wget. Browsing the web I came out with the following command to mirror a complete website:

wget --mirror --convert-links --adjust-extension --backup-converted --page-requisites -e robots=off http://www.example.com

As expected, after running the command there is a folder called www.example.com containing all downloaded files. However, some background images are missing. Digging through the files and logs I found that wget seems to have a problem with quoted image urls.

The website uses the following css to include a background-image:

background-image: url("/path/to/image");

Collecting the pages requisites wget parses the url and tries to download the file

http://www.example.com/"/path/to/image"

which obviously fails with an error 404.

I already tried to find a solution on the web, but did not manage to find the right keywords to search for, so as a last choice I must ask you for help.

Is there any way to tell wget to ignore quotes inside urls?

Thank you very much in advance!

Aucun commentaire:

Enregistrer un commentaire