Re: master d57bb0c: Treat passed strings as raw-text when percent-escapi

From: Robert Pluim
Subject: Re: master d57bb0c: Treat passed strings as raw-text when percent-escaping in epg
Date: Thu, 12 Dec 2019 16:19:46 +0100

>>>>> On Thu, 12 Dec 2019 08:58:33 -0500, Stefan Monnier <address@hidden> said:

    Stefan> Hi Robert,
    >> The strings contained in gpg keys can contain UTF-8 data, but can also
    >> use percent-escapes to encode non-ASCII chars.  When converting those
    >> escapes, use 'raw-text' coding system rather than 'string-to-unibyte',
    >> since the latter signals an error for non-ASCII characters.

    Stefan> I don't quite understand: "can contain UTF-8 data" seems odd here 
    Stefan> you're calling `encode-coding-string` whose input argument is a 
    Stefan> of characters whereas "UTF-8 data" can only be found in sequences 
of bytes.

    Stefan> Did you mean "can contain non-ASCII characters"?

"can contain non-ASCII characters encoded using UTF-8", which means
they end up in a multi-byte string in emacs.

    Stefan> The other problem with the above description is the "raw-text" since
    Stefan> it's far from clear what it means (personally I really have no idea
    Stefan> what is "raw text" and the way Emacs understands "raw text" is more 
    Stefan> less "EOL-separated lines of bytes" which does not seem to match 
    Stefan> description since string-to-unibyte doesn't signal errors when
    Stefan> encountering bytes).

Itʼs replacing the use of string-to-unibyte on a multibyte string
containing non-ASCII characters, which signals an error, with
encode-coding-string using 'raw-text, which produces a bunch of
bytes. My other choices were 'binary or 'no-conversion, which do the
same, but have even less meaningful names.

    Stefan> Looking at the code, I see that the only caller of
    Stefan> `epg--decode-percent-escape` seems to be
    Stefan> `epg--decode-percent-escape-utf-8` which decodes the bytes returned 
    Stefan> `epg--decode-percent-escape` using `utf-8` so I think it makes more
    Stefan> sense to encode using `utf-8` than `raw-text`, WDYT?

No. The string that is passed to epg--decode-percent-escape can
contain non-ASCII characters encoded as UTF-8, plus percent-escaped
representations of non-ASCII characters. In order to convert those
percent-escaped characters correctly, the string has to be treated as
a unibyte array of bytes, then re-converted to multibyte by encoding
with utf-8 afterwards.


