[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

character sets as they relate to “Raw” string literals for elisp

From: Daniel Brooks
Subject: character sets as they relate to “Raw” string literals for elisp
Date: Mon, 04 Oct 2021 08:36:40 -0700
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)

Eli Zaretskii <eliz@gnu.org> writes:

> We can only do this much.  We don't develop any terminal emulators
> here, except the two built into Emacs.

I was referring broadly to the whole GNU project, not trying to assign
the work specifically to the Emacs project. :)

I was even pondering what it would take to do the work myself, now that
Rust is allowed in kernel modules…

> Given that even the Linux console turns out to have staggering gaps in
> its support for Unicode, I see no reason for us to pretend Unicode is
> supported well enough on the terminals to ignore this issue.

The Linux console is not representative of most terminal emulators. It
is neglected and rarely used, since it is intended only as a fall–back
in case X Windows (or sshd) fails to start. Ideally we should fix it
(again speaking broadly), but we (emacs) shouldn’t limit ourselves to
only what it can support.

>> For example, if someone contributes a mode it will normally be accepted
>> as–is. But if they write the that mode using Japanese characters, would we
>> turn them away? I think that we should not.
> Why is Japanese different from any other script in this context?

It isn’t; I simply picked one at random.

> I thin unnecessary use of non-ASCII characters, any non-ASCII
> characters, should be avoided, for the reasons mentioned above.  See
> bug#50865 for a recent example that left me astonished.

I think that your suggestion to set the terminal-coding-system to
latin-1 or us-ascii on the Linux console is the right one. Perhaps that
ought to be the default behavior when Emacs detects that it is running
in the Linux console, even if the LANG variable indicates that we should
be using utf-8. Or perhaps Emacs should instead issue a warning in that
case, since for all we know the Linux console could be fixed next week.

But in any case, back to my question:

Suppose our hypothetical contributor wanted to contribute a new mode
with this type of code in it:

    (defun 日本 () (message "日本"))

That is, all of the identifiers in the source code for this mode are
named in some horrible foreign script that you cannot read. Is it so
much more unreadable if it sometimes has to be displayed like this?

    (defun \u65E5\u672C () (message "\u65E5\u672C"))

More to the point, do we turn away this contributor or ask them to
rewrite their code? My preference is that we simply accept the
contribution as–is.

If we could see our way to accepting such code, then I don’t see why we
couldn’t accept code that uses Unicode in much smaller ways, such as

    (defvar variable-containing-html #r「<a href="foo.html">click here</a>」)


PS: it occurs to me to wonder if my use of Unicode in the prose of this
message, outside of the examples, detracted from its readability in any

reply via email to

[Prev in Thread] Current Thread [Next in Thread]