bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE


From: Mattias Engdegård
Subject: bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE
Date: Mon, 6 Apr 2020 18:55:30 +0200

6 apr. 2020 kl. 18.33 skrev Eli Zaretskii <eliz@gnu.org>:

> I think it might be just some convenience thing: utf-7 and utf-8 have
> something in common that made it convenient to treat them the same in
> the internal routines.  Or maybe it's just an accident.

There is nothing common between utf-7 and utf-8 at all (apart from a subset of 
ASCII being encoded in the same way, and the fact that both encode the Unicode 
repertoire).

> Why do you think the ASCII encoding contradicts the utf-16
> coding-type?

Because :coding type is the first stage of decoding, or the last stage of 
encoding. It reflects the low-level structure of the encoded data: using utf-16 
as :coding-type implies that utf-7 is encoded into 16-bit parcels, but it's not 
-- the result of utf-7-imap encoding is a sequence of ASCII bytes. (UTF-16 
plays a part in an intermediary step for some values before they are 
base64-encoded, but that's not visible in the final byte stream.)

> I don't think 'charset' is the right type for this encoding (any
> reason why you've chosen it?), but I will let Handa-san comment.

We could use 'raw-text' as well but that implies that any byte value could be 
part of an utf-7[-imap] text, which is incorrect.
In fact, utf-7-imap only uses codes 0x20-0x7e (utf-7 is allowed to use a few C0 
controls too, as mentioned).

Arguably the heuristics of define-coding-system-internal are somewhat 
inscrutable. There seems to be leaks between layers -- ascii-compatible-p is an 
end-to-end property and cannot really be set the way it is by that function. 
But since it is, fixing it afterwards should be the correct way.






reply via email to

[Prev in Thread] Current Thread [Next in Thread]