bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#6252: Emacs does not implement URL (aka "percent") decoding correctl


From: YAMAMOTO Mitsuharu
Subject: bug#6252: Emacs does not implement URL (aka "percent") decoding correctly.
Date: Mon, 24 May 2010 12:33:46 +0900
User-agent: Wanderlust/2.14.0 (Africa) SEMI/1.14.6 (Maruoka) FLIM/1.14.8 (Shijō) APEL/10.6 Emacs/22.3 (sparc-sun-solaris2.8) MULE/5.0 (SAKAKI)

>>>>> On Sun, 23 May 2010 01:46:54 +0200, José A. Romero L. 
>>>>> <escherdragon@gmail.com> said:

> Seems that RFC 3986 has not been implemented correctly in
> Emacs. IMHO that is an important hole you have found there. The
> standard requires that all unreserved characters be encoded/decoded
> as UTF8 bytes.

If you are referring to the following part of RFC 3986, it doesn't say
anything about existing URI schemes (as opposed to "a new URI
scheme"), those defining a component that does NOT represent textual
data, or even for textual data, those NOT consisting of characters
from the Universal Character Sets.

  When a new URI scheme defines a component that represents textual
  data consisting of characters from the Universal Character Set
  [UCS], the data should first be encoded as octets according to the
  UTF-8 character encoding [STD63]; then only those octets that do not
  correspond to characters in the unreserved set should be percent-
  encoded.

(See also http://lists.gnu.org/archive/html/emacs-devel/2006-08/msg00065.html)

Though returning a multibyte string decoded as UTF-8 would be useful
for many cases, I think some "unhex"ing function should also provide a
functionality to return a unibyte string.

                                     YAMAMOTO Mitsuharu
                                mituharu@math.s.chiba-u.ac.jp





reply via email to

[Prev in Thread] Current Thread [Next in Thread]