[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] Redirect containing %2B behaves differently depending on
From: |
Tim Rühsen |
Subject: |
Re: [Bug-wget] Redirect containing %2B behaves differently depending on locale |
Date: |
Fri, 03 Apr 2015 14:16:30 +0200 |
User-agent: |
KMail/4.14.2 (Linux/3.16.0-4-amd64; KDE/4.14.2; x86_64; ; ) |
Hi Ander,
Am Freitag, 3. April 2015, 12:26:09 schrieb Ander Juaristi:
> On 03/13/2015 11:48 PM, Adam Sampson wrote:
> > Hi,
> >
> > I've just found a case where wget 1.16.3 responds to a 302 redirect
> > differently depending on whether it's in an ASCII or UTF-8 locale.
> >
> > This works:
> > LC_ALL=en_GB.UTF-8 wget
> > https://bitbucket.org/pypy/pypy/downloads/pypy-2.5.0-src.tar.bz2
> >
> > This doesn't work:
> > LC_ALL=C wget
> > https://bitbucket.org/pypy/pypy/downloads/pypy-2.5.0-src.tar.bz2
> >
> > I've attached logs with -d showing what's actually going on. The
> >
> > initial request gives a 302 response with a Location: that contains:
> > ....tar.bz2?Signature=up6%2BtTpSF...
> >
> > In the UTF-8 locale, wget correctly redirects to that location.
> >
> > In the ASCII locale, wget -d print a "converted: '...' -> '...'" line
> >
> > (from iri.c's do_conversion), then redirects to:
> > ....tar.bz2?Signature=up6+tTpSF...
> >
> > (If you try it yourself you'll get a slightly different URL, but at
> > least for me it usually contains %2B somewhere.)
> >
> > This appears to be because do_conversion calls url_unescape on the
> > input string it's given -- even though that input string is a _const_
> > char * in the code that calls it (main -> retrieve_url -> url_parse ->
> > remote_to_utf8 -> do_conversion). It's not immediately obvious to me
> > whether that's intentional or not; at the very least, it's a surprising
> > bit of behaviour.
>
> That call to url_unescape() is necessary because iconv() needs the multibyte
> characters with no encoding. My first approach, by the way, was to remove
> that call, but that caused Test-iri-percent.px to fail, which is pretty
> clear.
>
> The issue seems to be at the call to reencode_escapes(), just after
> remote_to_utf8() returns. The problem here is that %2B resolves to "+"
> (literal). And that character is equal to the reserved character "+", and
> reencode_escapes() treats it as a reserved characters and leaves it as-is.
> The same happens with other characters, such as "=" (%3D).
>
> What I propose is to tag the characters that have been decoded, in
> url_unescape(), and then in reencode_escapes(), verify if they coincide
> with reserved characters as well.
>
> What do you guys think?
Without looking at the code right now and from what you describe above, your
proposal sounds like a good idea. This problem pops up again and again. If you
solve the issue, some people will love you :-)
Regards, Tim
signature.asc
Description: This is a digitally signed message part.
- Re: [Bug-wget] Redirect containing %2B behaves differently depending on locale, Ander Juaristi, 2015/04/03
- Re: [Bug-wget] Redirect containing %2B behaves differently depending on locale,
Tim Rühsen <=
- Re: [Bug-wget] Redirect containing %2B behaves differently depending on locale, Ander Juaristi, 2015/04/03
- Re: [Bug-wget] Redirect containing %2B behaves differently depending on locale, Ander Juaristi, 2015/04/13
- Re: [Bug-wget] [PATCH 3/3] Redirect containing %2B behaves differently depending on locale, Ander Juaristi, 2015/04/21
- Re: [Bug-wget] [PATCH 3/3] Redirect containing %2B behaves differently depending on locale, Darshit Shah, 2015/04/21
- Re: [Bug-wget] [PATCH 3/3] Redirect containing %2B behaves differently depending on locale, Ander Juaristi, 2015/04/22
- Re: [Bug-wget] [PATCH 3/3] Redirect containing %2B behaves differently depending on locale, Ander Juaristi, 2015/04/22