[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Encoding of etc/HELLO

From: Michael Welsh Duggan
Subject: Re: Encoding of etc/HELLO
Date: Sat, 21 Apr 2018 10:58:53 -0400
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux)

Eli Zaretskii <address@hidden> writes:

>> From: Stefan Monnier <address@hidden>
>> Cc: address@hidden
>> Date: Fri, 20 Apr 2018 16:42:02 -0400
>> > The whole point of ISO-2022 is that the same Unicode codepoints can
>> > come from different ISO-2022 charsets, and the ISO-2022 encoding keeps
>> > that information in the bytestream.
>> My question was meant to see if there's a way to encode a similar kind
>> of charset info into the bytestream.  From what you say above, there is
>> such a thing but its use is discouraged.
> If you mean a Unicode-compatible bytestream, then yes, that's the
> feature I know of.  But if we want to use it in Emacs, we should
> modify the UTF-x decoders to put the charset properties on the decoded
> text, or invent a new property (since charset is currently 'unicode'),
> and then augment the font selection code to consider that new
> property.
>> Clearly this problem is not specific to Emacs, so what do people do?
>> Hold on to iso-2022 for as long as they can (like we do in Emacs)?
>> Give up on these "details" of rendering for files using a mix of C, J, and K?
>> Rely on higher-level info (XML tags and friends) to carry the charset info?
> I don't know.  Several years ago, I think each vendor used a private
> extension of ISO-2022 to support the emoji, not sure if that is still
> the case, especially since the number of standardized emoji continues
> to grow all the time.  We could perhaps follow one such extension in
> our support of ISO-2022.  Or we could decide that the Han unification
> has conquered the world, and therefore the CJK charset distinction for
> font selection is no longer important enough for us, in which case we
> could recode HELLO in UTF-8.

I would suppose that the usual way to do this (encode glyph variants in
a Unicode-compatible bytestream) would be to use some form of document
markup.  In Emacs's case, enriched-mode would seem an ideal candidate
for this.  RFC-1896 specifically supports private extensions for
attributes using the "X-" syntax, and enriched.el is small and should be
simple to modify for this purpose.

Michael Welsh Duggan

reply via email to

[Prev in Thread] Current Thread [Next in Thread]