[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Encoding of etc/HELLO

From: Eli Zaretskii
Subject: Re: Encoding of etc/HELLO
Date: Sat, 21 Apr 2018 10:07:38 +0300

> From: Stefan Monnier <address@hidden>
> Cc: address@hidden
> Date: Fri, 20 Apr 2018 16:42:02 -0400
> > The whole point of ISO-2022 is that the same Unicode codepoints can
> > come from different ISO-2022 charsets, and the ISO-2022 encoding keeps
> > that information in the bytestream.
> My question was meant to see if there's a way to encode a similar kind
> of charset info into the bytestream.  From what you say above, there is
> such a thing but its use is discouraged.

If you mean a Unicode-compatible bytestream, then yes, that's the
feature I know of.  But if we want to use it in Emacs, we should
modify the UTF-x decoders to put the charset properties on the decoded
text, or invent a new property (since charset is currently 'unicode'),
and then augment the font selection code to consider that new

> Clearly this problem is not specific to Emacs, so what do people do?
> Hold on to iso-2022 for as long as they can (like we do in Emacs)?
> Give up on these "details" of rendering for files using a mix of C, J, and K?
> Rely on higher-level info (XML tags and friends) to carry the charset info?

I don't know.  Several years ago, I think each vendor used a private
extension of ISO-2022 to support the emoji, not sure if that is still
the case, especially since the number of standardized emoji continues
to grow all the time.  We could perhaps follow one such extension in
our support of ISO-2022.  Or we could decide that the Han unification
has conquered the world, and therefore the CJK charset distinction for
font selection is no longer important enough for us, in which case we
could recode HELLO in UTF-8.

I've added Handa-san to this discussion in the hope that he could
comment on what would be the bets way forward.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]