emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: character sets as they relate to “Raw” string literals for elisp


From: Daniel Brooks
Subject: Re: character sets as they relate to “Raw” string literals for elisp
Date: Mon, 04 Oct 2021 13:49:53 -0700
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Daniel Brooks <db48x@db48x.net>
>> Cc: emacs-devel@gnu.org,  rms@gnu.org,  anna@crossproduct.net
>> Date: Mon, 04 Oct 2021 08:36:40 -0700
>> 
>> Eli Zaretskii <eliz@gnu.org> writes:
>> 
>> > We can only do this much.  We don't develop any terminal emulators
>> > here, except the two built into Emacs.
>> 
>> I was referring broadly to the whole GNU project, not trying to assign
>> the work specifically to the Emacs project. :)
>
> Then this is not necessarily the best place to raise these issues.

I was replying directly to RMS concerning his statement about non–ascii
characters. RMS is known to have opinions with a wider scope than will
fit in any single mailing list, and I was responding in kind. I
apologize for using “we” so broadly without thinking; it is certainly
the kind of thing that is confusing, so I should have been much more
explicit.

>> Suppose our hypothetical contributor wanted to contribute a new mode
>> with this type of code in it:
>> 
>>     (defun 日本 () (message "日本"))
>
> It would be very inconvenient to have such code.

Absolutely! Possibly almost as inconvenient as having to learn some
English in order to develop the thing. But it doesn’t answer my
question.

I see that prolog-mode only gets a few commits per year (9 last year and
5 so far this year; the high water mark is 10 in a single year). It
imposes a pretty minimal support burden and if it has bugs you can
simply ignore them until a Prolog user brings you a patch, because those
bugs can only affect Prolog users. There is a lot of code in Emacs which
fits this description.

Suppose this hypothetical contribution were a language mode for a
Japanese programming language, and thus had the same support profile?
Suppose also that all messages to the user have already been localized
into English, and that there is an English alias for the mode name (that
is, `日本-mode' toggles the mode, but there’s an alias like `ja-mode' or
something), while the rest of the identifiers are in Japanese.

Would there be any reason to turn away that contribution, or to make the
contributor rewrite it?

Stefan Monnier <monnier@iro.umontreal.ca> writes:

> FWIW, I consider this case quite different from your raw-string case,
> because here the main issue for me is whether the code is maintainable
> and reviewable by someone else.  So, in the context of Emacs, GNU
> ELPA, and NonGNU ELPA, I find such uses problematic.  If I could count
> on having someone else I trust do the reviewing, I might reconsider.

I think that if I read between the lines, you are saying that the Emacs
project _could_ grow to become multi–lingual at all levels, with a
sufficient number of invested contributors who could each review and
maintain different parts of the code. Also that like Eli, you would find
it inconvenient or problematic in the short term. Is that a fair
reading?

> We have that where it's inevitable (like in some packages that define
> features specific to some languages), but even there we prefer to use
> the likes of \u672c instead of the literal characters.  At the very
> least, that avoids the problem with not having a suitable font to
> display them.

As an aside, I think that this is a sensible enough choice, though I
would prefer to choose a more automatic solution. That is, relying on
particular viewers of the source code to tweak their Emacs settings to
present the source differently instead of relying on contributors to use
the codepoint numbers directly. As you suggested in bug#50865, changing
the encoding will automatically render those characters with their
codepoint numbers, which is nicer than forcing a human to type them in
before committing. This has the advantage of working on identifiers as
well as string literals.

>> If we could see our way to accepting such code, then I don’t see why we
>> couldn’t accept code that uses Unicode in much smaller ways, such as
>> this:
>> 
>>     (defvar variable-containing-html #r「<a href="foo.html">click here</a>」)
>
> If we avoid non-ASCII characters, we avoid some problems, so all else
> being equal, it's better.

Hmm. If we (speaking as broadly as possible!) avoid a problem forever,
how will the problem ever get fixed?

Personally, I think that the problems are now mostly fixed. Emacs has
very complete support for character sets, better than virtually all
other applications. Outside of Emacs, support for Unicode is practically
omnipresent as well. There are still notable gaps, like the Linux
console, but they are the exception rather than the rule. I don’t think
that there is much of a problem left to avoid!

>> PS: it occurs to me to wonder if my use of Unicode in the prose of this
>> message, outside of the examples, detracted from its readability in any
>> way?
>
> If someone is reading this on a text-mode terminal, it could.

I am asking if anyone reading my messages, either this one or any of the
last dozen I have sent to the list, have noticed any specific
problems. I have used non–ascii characters in all of them. I’m wondering
if anyone even noticed. If nobody noticed, or if they didn’t detract
from readability, then it is unlikely that Unicode is a problem in
general.

Yuri Khan <yuri.v.khan@gmail.com> writes:

> On Tue, 5 Oct 2021 at 01:58, Eli Zaretskii <eliz@gnu.org> wrote:
>
>> If someone is reading this on a text-mode terminal, it could.
>
> We should probably invent a term more accurate than “text-mode
> terminal” for things that fail to display text.

True! :D

I prefer to say “Linux console” in reference to the one terminal
emulator that we know has severe problems with Unicode. There are many
terminal emulators out there, and I’m sure a few of them have problems,
but for the most part I think all of them can handle Unicode pretty well
primarily because they all rely on OS libraries to do the heavy
lifting. The Linux console is handicapped in this area primarily because
it is inside the kernel, and thus cannot dynamically load libharfbuzz
and libfreetype. (But I can imagine a hypothetical future kernel module
which statically links against them in order to provide a full–featured
terminal in the console.)

db48x



reply via email to

[Prev in Thread] Current Thread [Next in Thread]