[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Emacs-diffs] master db828f6: Don't rely on defaults in decoding UTF

From: stephen
Subject: Re: [Emacs-diffs] master db828f6: Don't rely on defaults in decoding UTF-8 encoded Lisp files
Date: Sun, 27 Sep 2015 09:12:51 +0900

>>>>> Paul Eggert writes:
 > Eli Zaretskii wrote:

 >> So you are, in effect, saying that it is incorrect to derive the
 >> default encodings from the locale's codeset?

 > Yes, for Emacs developers.

I think this makes sense.  IIUC Emacs already uses characters outside
of the Unicode repertoire, so it shouldn't be too hard to replicate
any Emacs capabilities that require non-Unicode characters or charsets
*inside* Emacs by using such characters.  Assuming there are any; I
suspect even HELLO doesn't actually need them.  There's no "gaiji"
problem of how to tell Emacs what to do with those characters; the
developer who introduces them into Emacs is responsible for adding
them to Emacs's non-Unicode repertoire.

 > And come to think of it, for most Emacs users.

I hope not, because that would imply that Emacs users in China, Japan,
probably Korea, and Taiwan are becoming a decreasing rather than
increasing fraction of Emacs users.

 > Nowadays in my experience most non-ASCII text files use UTF-8,
 > regardless of locale.

Toto, I don't think we're in Kansas any more.

 > The old days of having to guess encoding from the locale are
 > passing away.  This is partly due to UTF-8 being the encoding of
 > choice for HTML and XML, where UTF-8 overtook the older 8-bit
 > encodings in 2008 and now is by far the dominant encoding.

On the commercial internet, yes, but not for government and academic
sites in Japan and China.

 > One way to accommodate the new reality would be to

Recognize that it's probably due to insufficient experience?

 > change Emacs so that by default the system locale does not affect
 > Emacs's guess of a file's encoding if the file's initial sample is
 > valid UTF-8.

"Not affect" is probably a bad idea.  Giving UTF-8 too strong
preference on Windows is a bad idea, because there are a lot of
Windows coding systems that use UTF-8 trailing bytes to represent
characters; it's occasionally possible to run into UTF-8-conforming
files that are intended to be something else.  This isn't true for
ISO-8859 coding systems.

 > Users could set a variable to re-enable the old behavior.  If we
 > did this, we wouldn't have the error-prone process if sprinkling
 > 'coding: utf-8' cookies all over the place.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]