bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] -m --iri unnecessarily modifies double-escapes incorrectly, w


From: Barry Allard
Subject: [Bug-wget] -m --iri unnecessarily modifies double-escapes incorrectly, whereas -m --no-iri works
Date: Sun, 27 Sep 2015 14:29:24 -0700

# skips all double-encoded [ui]ris because it reinterprets them, outside 
uri.c:reencode_escapes(), probably in iri.c.
wget --iri -mr http://www.liteirc.net/mirrors/siyobik.info/reference.html

# works
wget --no-iri -mr http://www.liteirc.net/mirrors/siyobik.info/reference.html

Correct [ui]ri: 
http://www.liteirc.net/mirrors/siyobik.info/instruction/XLAT%252FXLATB.html 
(200)
Incorrect [ui]ri: Correct [ui]ri: 
http://www.liteirc.net/mirrors/siyobik.info/instruction/XLAT%2FXLATB.html (404)
# pcnt_decode(pcnt_decode(“%252F”) -> “%2F") -> “/"

Simple-but-incomplete hackaround: use --no-ri

To improve compatibility with mirroring international sites, the iri code path 
could approximate behavior of url.c/url_parse() by avoiding unnecessary 
modification to --mirror extracted [ui]ris, possibly around the time it 
adds/dequeues them to/from the queue.

Best,
Barry Allard

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail


reply via email to

[Prev in Thread] Current Thread [Next in Thread]