[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] URL encoding issues (Was: GNU wget 1.17.1 released)
From: |
Eli Zaretskii |
Subject: |
Re: [Bug-wget] URL encoding issues (Was: GNU wget 1.17.1 released) |
Date: |
Mon, 14 Dec 2015 21:58:59 +0200 |
> From: Tim Rühsen <address@hidden>
> Date: Mon, 14 Dec 2015 20:22:41 +0100
>
> > 1. The functions that call 'iconv' (in iri.c) don't make a point of
> > flushing the last portion of the converted URL after 'iconv'
> > returns successfully having converted the input string in its
> > entirety. IME, you need then to call 'iconv' one last time with
> > either the 2nd or the 3rd argument set to NULL, otherwise
> > sometimes the last converted character doesn't get output. In my
> > case, some URLs converted from CP1255 to UTF-8 lost their last
> > character. It sounds like no one has actually used this
> > conversion in iri.c, except for trivially converting UTF-8 to
> > itself. Is that possible/reasonable?
>
> Possibly.
> Could you please give an example string ? I would like to test it on
> GNU/Linux, BSD and Solaris to see if the output is always the same.
This is what gave me trouble:
https://he.wikipedia.org/wiki/%F9._%F9%F4%F8%E4
This is https://he.wikipedia.org/wiki/ש._שפרה that Andries was using
in his tests, but it's encoded in CP1255 (and hex-encoded after that).
Try converting it into UTF-8, and you will get the last character
chopped off after 'iconv' returns. Or at least that's what happens
for me.
> > 2. Wget assumes that the URL given on its command line is encoded in
> > the locale's encoding. This is a good assumption when the user
> > herself types the URL at the shell prompt, but not when the URL is
> > copy-pasted from a browser's address bar. In the latter case, the
> > URL tends to be in UTF-8 (sometimes hex-encoded). At least that's
> > what I get from Firefox. We don't seem to have in wget any
> > facilities to specify a separate (3rd) encoding for the URLs on
> > the command line, do we?
>
> I stumbled upon this a while ago when thinking about the design of wget2. And
> wget2 already has a working --input-encoding option for such cases.
> AFAIK, nobody asked for such an option during the last years - so I assume
> this to be a somewhat 'expert' or 'fancy' option, at least a low priority one.
> It is an optional goodie.
IMO, it's a sorely missing feature, since copy/pasting URLs from a
browser is something people do very often. I do it all the time,
because many times wget is much better in downloading large files than
a browser.
- Re: [Bug-wget] GNU wget 1.17.1 released, (continued)
- Re: [Bug-wget] GNU wget 1.17.1 released, Ander Juaristi, 2015/12/13
- Re: [Bug-wget] GNU wget 1.17.1 released, Tim Rühsen, 2015/12/13
- Re: [Bug-wget] GNU wget 1.17.1 released, Eli Zaretskii, 2015/12/13
- Re: [Bug-wget] GNU wget 1.17.1 released, Tim Rühsen, 2015/12/13
- Re: [Bug-wget] GNU wget 1.17.1 released, Eli Zaretskii, 2015/12/13
- Re: [Bug-wget] GNU wget 1.17.1 released, Andries E. Brouwer, 2015/12/13
- Re: [Bug-wget] GNU wget 1.17.1 released, Eli Zaretskii, 2015/12/13
- Re: [Bug-wget] GNU wget 1.17.1 released, Andries E. Brouwer, 2015/12/13
- Re: [Bug-wget] URL encoding issues (Was: GNU wget 1.17.1 released), Eli Zaretskii, 2015/12/14
- Re: [Bug-wget] URL encoding issues (Was: GNU wget 1.17.1 released), Tim Rühsen, 2015/12/14
- Re: [Bug-wget] URL encoding issues (Was: GNU wget 1.17.1 released),
Eli Zaretskii <=
- Re: [Bug-wget] URL encoding issues (Was: GNU wget 1.17.1 released), Tim Rühsen, 2015/12/14
- Re: [Bug-wget] URL encoding issues (Was: GNU wget 1.17.1 released), Tim Ruehsen, 2015/12/15
- Re: [Bug-wget] URL encoding issues (Was: GNU wget 1.17.1 released), Tim Ruehsen, 2015/12/15
- Re: [Bug-wget] URL encoding issues (Was: GNU wget 1.17.1 released), Tim Ruehsen, 2015/12/15
- Re: [Bug-wget] URL encoding issues (Was: GNU wget 1.17.1 released), Eli Zaretskii, 2015/12/15
- Re: [Bug-wget] URL encoding issues (Was: GNU wget 1.17.1 released), Tim Ruehsen, 2015/12/17
- Re: [Bug-wget] URL encoding issues (Was: GNU wget 1.17.1 released), Andries E. Brouwer, 2015/12/15
- Re: [Bug-wget] URL encoding issues (Was: GNU wget 1.17.1 released), Tim Ruehsen, 2015/12/15
- Re: [Bug-wget] Support non-ASCII URLs (Was: GNU wget 1.17.1 released), Eli Zaretskii, 2015/12/15
- Re: [Bug-wget] Support non-ASCII URLs, Eli Zaretskii, 2015/12/15