bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

URL translation %20 ->  


From: Joachim Lindenberg
Subject: URL translation %20 ->  
Date: Sat, 2 Jul 2022 13:17:59 +0200

Hello,
I am running my webpages through the https://validator.w3.org/nu validation 
service and looking to get green results.
Now in fact my webpages are produced using an IIS internally, then replicated 
using wget to a host serving to the public using nginx. Now I detected that the 
IIS version validates properly, the nginx version fails validation as follows: 

Error: Bad value documents/BsiAuskunft/202203220858 Lindenberg-BSI Auskunft 
nach Artikel 15 DSGVO.eml for attribute href on element 
https://html.spec.whatwg.org/multipage/#the-a-element: Illegal character in 
path segment: space is not allowed.

In fact the reference reads 
<a 
href="documents/BsiAuskunft/202203220858&#32;Lindenberg-BSI&#32;Auskunft&#32;nach&#32;Artikel&#32;15&#32;DSGVO.eml">.
I checked my sources and the reference is actually written as 
<a 
href="https://webdav.lindenberg.one/WebSites/blog.lindenberg.one/documents/BsiAuskunft/202203231228%20BSI-Lindenberg%20Ihre%20Anfrage%20vom%2022.03.2022.eml";>
The translation is obviously done by wget. In case that matters, I am using 
“wget -N -r -np -nH -k -l 10 --ignore-case --reject "*.inc,*.master,*.config" 
--user=** --password=** --cut-dirs=1 https://*** .

I found 
stackoverflow.com/questions/13300017/wget-download-relative-link-conversion-misses-whitespace-encoding-for-css-url
 as the same problem, but it looks like it was never considered a bug or 
feature in wget.

Is this a known issue? Considered as a “might break sites” type of fix?
Best Regards,
Joachim





reply via email to

[Prev in Thread] Current Thread [Next in Thread]