[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#6252: Emacs does not implement URL (aka "percent") decoding correctl
From: |
YAMAMOTO Mitsuharu |
Subject: |
bug#6252: Emacs does not implement URL (aka "percent") decoding correctly. |
Date: |
Mon, 24 May 2010 12:33:46 +0900 |
User-agent: |
Wanderlust/2.14.0 (Africa) SEMI/1.14.6 (Maruoka) FLIM/1.14.8 (Shijō) APEL/10.6 Emacs/22.3 (sparc-sun-solaris2.8) MULE/5.0 (SAKAKI) |
>>>>> On Sun, 23 May 2010 01:46:54 +0200, José A. Romero L.
>>>>> <escherdragon@gmail.com> said:
> Seems that RFC 3986 has not been implemented correctly in
> Emacs. IMHO that is an important hole you have found there. The
> standard requires that all unreserved characters be encoded/decoded
> as UTF8 bytes.
If you are referring to the following part of RFC 3986, it doesn't say
anything about existing URI schemes (as opposed to "a new URI
scheme"), those defining a component that does NOT represent textual
data, or even for textual data, those NOT consisting of characters
from the Universal Character Sets.
When a new URI scheme defines a component that represents textual
data consisting of characters from the Universal Character Set
[UCS], the data should first be encoded as octets according to the
UTF-8 character encoding [STD63]; then only those octets that do not
correspond to characters in the unreserved set should be percent-
encoded.
(See also http://lists.gnu.org/archive/html/emacs-devel/2006-08/msg00065.html)
Though returning a multibyte string decoded as UTF-8 would be useful
for many cases, I think some "unhex"ing function should also provide a
functionality to return a unibyte string.
YAMAMOTO Mitsuharu
mituharu@math.s.chiba-u.ac.jp