mardi 25 août 2020

wget messes up the html

I am currently hosting a blog on Ghost CMS on my local machine and am creating a static site from the hosted site at localhost:2368 using wget. It works well but the only problem I am facing is that the "srcset" is messed

<img class="post-card-image" srcset="content/images/size/w300/2020/08/logo-1--1.svg 300w,
                   content/images/size/w600/2020/08/logo-1--1.svgg 600w,
                  content/images/size/w1000/2020/08/logo-1--1.svgvg 1000w,
                 content/images/size/w2000/2020/08/logo-1--1.svgsvg 2000w" sizes="(max-width: 1000px) 400px, 700px" loading="lazy" src="content/images/size/w600/2020/08/logo-1--1.svg" alt="Test">

Notice how the extension is messed up for the 600w, 100w and 2000w as svgg, svgvg, svgsvg. This prevents the image from loading. I need to manually fix the extensions in the HTML.

Saving the HTML using the browser at localhost:2368 has no such problem. The same element when the HTML is saved using the browser

<img class="post-card-image" srcset="/content/images/size/w300/2020/08/logo-1--1.svg 300w,
                    /content/images/size/w600/2020/08/logo-1--1.svg 600w,
                    /content/images/size/w1000/2020/08/logo-1--1.svg 1000w,
                    /content/images/size/w2000/2020/08/logo-1--1.svg 2000w" sizes="(max-width: 1000px) 400px, 700px" loading="lazy" src="/content/images/size/w600/2020/08/logo-1--1.svg" alt="Test">

But this is not an option since I have to save everything recursively manually.

The wget command I am using is

from_url=localhost:2368
to_url=example.com
to_https=true
export_directory=dist

# Copy blog content
wget --recursive --page-requisites --no-host-directories --remote-encoding=utf-8 --directory-prefix=${export_directory} --adjust-extension --restrict-file-names=windows --timeout=30 --no-parent --convert-links ${from_url}/

Using wget 1.20.3 I have already tried it without the --remote-encoding flag




Aucun commentaire:

Enregistrer un commentaire