[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] URL encoding issues (Was: GNU wget 1.17.1 released)
From: |
Tim Ruehsen |
Subject: |
Re: [Bug-wget] URL encoding issues (Was: GNU wget 1.17.1 released) |
Date: |
Thu, 17 Dec 2015 15:30:02 +0100 |
User-agent: |
KMail/4.14.10 (Linux/4.3.0-1-amd64; KDE/4.14.14; x86_64; ; ) |
Thanks, I pushed your changes to master.
Tim
On Tuesday 15 December 2015 18:52:01 Eli Zaretskii wrote:
> > From: Tim Ruehsen <address@hidden>
> > Cc: Eli Zaretskii <address@hidden>
> > Date: Tue, 15 Dec 2015 11:02:21 +0100
> >
> > I pushed a conversion fix to master.
>
> Thanks!
>
> > There is another bug in wget that comes out with
> > wget -d --local-encoding=cp1255
> > 'http://he.wikipedia.org/wiki/%F9._%F9%F4%F8%E4'
> >
> > Wget double escapes/converts to UTF-8... Maybe you can address this when
> > you are working on the code !?
>
> You mean, because http redirects to https? Yes, I've seen that
> already. The simple patch below fixes that. The problem seems to be
> that wget assumes the redirected URL to be encoded in the same
> encoding as the original one (which, as described earlier, starts with
> the local encoding), whereas it is much more reasonable to use the
> value provided by --remote-encoding.
>
> And if the 'if' in the patch looks strange to you, it's rightfully
> so. Look at this strange logic in set_uri_encoding:
>
> /* Set uri_encoding of struct iri i. If a remote encoding was specified,
> use it unless force is true. */
> void
> set_uri_encoding (struct iri *i, const char *charset, bool force)
> {
> DEBUGP (("URI encoding = %s\n", charset ? quote (charset) : "None"));
> if (!force && opt.encoding_remote)
> return;
>
> I understand the reason to prefer opt.encoding_remote when the 'force'
> flag is false -- the user-provided remote encoding should take
> preference. But why return without making sure the URI's encoding is
> in fact set to that?? I guess there's some assumption that
> iri->uri_encoding is already set to opt.encoding_remote, but this
> assumption is certainly false in this case. So I tyhink this function
> should be changed to actually use opt.encoding_remote, if non-NULL,
> and otherwise use 'charset' even if 'force' is false. Then the patch
> below could be simplify to avoid the test. WDYT?
>
> Here's the patch I promised. With it, wget survives redirection from
> http to https and successful retrieves that page.
>
>
> diff --git a/src/retr.c b/src/retr.c
> index a6a9bd7..6af26a0 100644
> --- a/src/retr.c
> +++ b/src/retr.c
> @@ -872,9 +872,11 @@ retrieve_url (struct url * orig_parsed, const char
> *origurl, char **file, xfree (mynewloc);
> mynewloc = construced_newloc;
>
> - /* Reset UTF-8 encoding state, keep the URI encoding and reset
> + /* Reset UTF-8 encoding state, set the URI encoding and reset
> the content encoding. */
> iri->utf8_encode = opt.enable_iri;
> + if (opt.encoding_remote)
> + set_uri_encoding (iri, opt.encoding_remote, true);
> set_content_encoding (iri, NULL);
> xfree (iri->orig_url);
- Re: [Bug-wget] GNU wget 1.17.1 released, (continued)
- Re: [Bug-wget] GNU wget 1.17.1 released, Eli Zaretskii, 2015/12/13
- Re: [Bug-wget] GNU wget 1.17.1 released, Andries E. Brouwer, 2015/12/13
- Re: [Bug-wget] URL encoding issues (Was: GNU wget 1.17.1 released), Eli Zaretskii, 2015/12/14
- Re: [Bug-wget] URL encoding issues (Was: GNU wget 1.17.1 released), Tim Rühsen, 2015/12/14
- Re: [Bug-wget] URL encoding issues (Was: GNU wget 1.17.1 released), Eli Zaretskii, 2015/12/14
- Re: [Bug-wget] URL encoding issues (Was: GNU wget 1.17.1 released), Tim Rühsen, 2015/12/14
- Re: [Bug-wget] URL encoding issues (Was: GNU wget 1.17.1 released), Tim Ruehsen, 2015/12/15
- Re: [Bug-wget] URL encoding issues (Was: GNU wget 1.17.1 released), Tim Ruehsen, 2015/12/15
- Re: [Bug-wget] URL encoding issues (Was: GNU wget 1.17.1 released), Tim Ruehsen, 2015/12/15
- Re: [Bug-wget] URL encoding issues (Was: GNU wget 1.17.1 released), Eli Zaretskii, 2015/12/15
- Re: [Bug-wget] URL encoding issues (Was: GNU wget 1.17.1 released),
Tim Ruehsen <=
- Re: [Bug-wget] URL encoding issues (Was: GNU wget 1.17.1 released), Andries E. Brouwer, 2015/12/15
- Re: [Bug-wget] URL encoding issues (Was: GNU wget 1.17.1 released), Tim Ruehsen, 2015/12/15
- Re: [Bug-wget] Support non-ASCII URLs (Was: GNU wget 1.17.1 released), Eli Zaretskii, 2015/12/15
- Re: [Bug-wget] Support non-ASCII URLs, Eli Zaretskii, 2015/12/15
- Re: [Bug-wget] Support non-ASCII URLs, Giuseppe Scrivano, 2015/12/16
- Re: [Bug-wget] Support non-ASCII URLs, Eli Zaretskii, 2015/12/16
- Re: [Bug-wget] Support non-ASCII URLs, Tim Ruehsen, 2015/12/17
- Re: [Bug-wget] Support non-ASCII URLs, Giuseppe Scrivano, 2015/12/17
- Re: [Bug-wget] Support non-ASCII URLs, Eli Zaretskii, 2015/12/17
- Re: [Bug-wget] Support non-ASCII URLs, Tim Rühsen, 2015/12/17