[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#6252: Emacs does not implement URL (aka "percent") decoding correctl
From: |
José A . Romero L . |
Subject: |
bug#6252: Emacs does not implement URL (aka "percent") decoding correctly. |
Date: |
Sun, 23 May 2010 01:46:54 +0200 |
On May 18, 20:14, Xah Lee <xah...@gmail.com> wrote:
> is there emacs lisp function that decode the url percent encoding?
> e.g.http://en.wikipedia.org/wiki/Sylvester%E2%80%93Gallai_theorem
> should become
> http://en.wikipedia.org/wiki/Sylvester–Gallai_theorem
> that's a EN DASH (unicode 8211, #o20023, #x2013).
> I know there's a
> (require 'gnus-util)
> gnus-url-unhex-string
> but that just unhex, and generate gibberish if the url contain unicode
> chars.
(...)
Seems that RFC 3986 has not been implemented correctly in Emacs. IMHO
that is an important hole you have found there. The standard requires
that all unreserved characters be encoded/decoded as UTF8 bytes. Even
though the encoding part looks OK (in url-util.el), the decoding does
not go that last mile to interpret the decoded bytes as UTF-8.
Until a proper implementation is done, I guess you could work around
the problem with something like this:
(decode-coding-string
(apply 'unibyte-string
(string-to-list
(url-unhex-string "http://en.wikipedia.org/wiki/Sylvester
%E2%80%93Gallai_theorem")))
'utf-8)
(yes, it's ugly as hell but hey, it's free ;])
I've just sent this very message as a bug report to the Emacs team.
Cheers,
--
José A. Romero L.
escherdragon@gmail.com
"We who cut mere stones must always be envisioning cathedrals."
(Quarry worker's creed)
- bug#6252: Emacs does not implement URL (aka "percent") decoding correctly.,
José A . Romero L . <=