Re: [Gnu-arch-users] URL Encoding

From: Jan Hudec
Subject: Re: [Gnu-arch-users] URL Encoding
Date: Sun, 21 Mar 2004 12:56:34 +0100
On Sat, Mar 20, 2004 at 17:14:36 -0500, Yannick Gingras wrote:
> Hash: SHA1
> On March 20, 2004 04:30 pm, Jan Hudec wrote:
> > Unfortunately it's a problem, because you can't tell, algorithmicaly,
> > whether a certain URL is encoded (there are cases you know it isn't and
> > cases where you don't know). So one must be chosen. Tla chooses
> > unencoded form. It should be documented, but tla shouln't guess what it
> > just got.
> I'm never really good at parsing RFCs but I read in 2396 that
>   "Because the percent "%" character always has the reserved purpose of
>    being the escape indicator, it must be escaped as "%25" in order to
>    be used as data within a URI."
> Isn't it possible to guess by looking at what follows the "%" ?

No, it's not. The problem is, that foo%25bar might be encoded form of
foo%bar, but might also be an unencoded form that needs to be encoded to

> I may be totally wrong (which is usually what happens when I read
> RFCs) but it seems to be possible.
> If I'm totally wrong, just a "--encoded" option for "tla register-archive" 
> would have gave me an instant hint about the encoded representation.

Such option might be reasonable.

Of course, I have never seen an URL with %, so a heuristic that if it
contains %, it is encoded, otherwise not would probably work for most
people, but Tom does not like heuristics like this (and is right -- it
would be a particularly ugly interface) (though all web browsers do
exactly this!).

                                                 Jan 'Bulb' Hudec 

