[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Emacs Lisp's future

From: Stephen J. Turnbull
Subject: Re: Emacs Lisp's future
Date: Tue, 14 Oct 2014 16:03:42 +0900

Eli Zaretskii writes:

 > That's not true: we try using UTF-8 wherever possible.  The few files
 > that don't use that simply cannot.

That doesn't seem to be true.  In fact many of the encodings
discovered by "grep -r -e '-\\*- coding:" are ISO 2022 conformant, and
a few indeed appear to be EUC encodings under an alias (eg,
chinese-iso-8bit-unix).  AFAICS, the only encodings listed that can't
be encoded in UTF-8 are the Big 5 family -- and that's only if you
demand bug-compatibility.[1]

So "simply cannot" evidently is your way of saying "inconvenient".[2]

Note that because of multiple encodings, in the Emacs tree "grep -r"
is probably just a bug.  It's not that you can't read the foreign
languages in "wrong" encodings.  Rather, if your search key is in one
of those languages, you'll *miss occurances* in the "wrong" encodings.

With your preferred default, most users will live their whole lives
without recognizing the bug.  With a strict default, they have a
fighting chance of learning about it.

[1]  Big 5 contains a few duplicated characters (at different code
points), so *as text* those files can be represented in Unicode (no
text information is lost since the characters in question are
identical in all ways except Big 5 code point), although *as binary
files* they may not be roundtrippable to UTF-8 (it depends on which
code point is chosen for the duplicated character).

[2]  The inconvenience is pretty significant, here: you'd lose
diff'ability across the conversion boundary.  Thus only new files are
*required* to use UTF-8 (no diff discontinuity), and conversions of
existing files are presumably done only with great care, if at all.

Still, I would think the benefits of having these files be greppable
(and etags-able!) would outweigh that inconvenience in a very short
period of time (maybe a year?)  Except for documentation files, the
files that need these characters probably don't change much.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]