Re: Encoding of etc/HELLO

From: Stefan Monnier
Subject: Re: Encoding of etc/HELLO
Date: Fri, 20 Apr 2018 16:42:02 -0400
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux)

> Unicode has language tag characters, but they are deprecated and their
> use is discouraged.
> In any case, I don't think Unicode features are relevant here, because
> we already have char-script-table, which is all you can do with a
> unified codepoint space.

Yes, I understand this part of the situation.

> The whole point of ISO-2022 is that the same Unicode codepoints can
> come from different ISO-2022 charsets, and the ISO-2022 encoding keeps
> that information in the bytestream.

My question was meant to see if there's a way to encode a similar kind
of charset info into the bytestream.  From what you say above, there is
such a thing but its use is discouraged.

Clearly this problem is not specific to Emacs, so what do people do?
Hold on to iso-2022 for as long as they can (like we do in Emacs)?
Give up on these "details" of rendering for files using a mix of C, J, and K?
Rely on higher-level info (XML tags and friends) to carry the charset info?


