Re: [Emacs-diffs] master db828f6: Don't rely on defaults in decoding UTF

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Emacs-diffs] master db828f6: Don't rely on defaults in decoding UTF

From:	Paul Eggert
Subject:	Re: [Emacs-diffs] master db828f6: Don't rely on defaults in decoding UTF-8 encoded Lisp files
Date:	Sat, 26 Sep 2015 11:53:09 -0700
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0

David Kastrup wrote:

How frequent are you reading Hebrew, Arabic, Chinese, Japanese, and
Korean texts?  How relevant is your experience?

Hebrew, not so much -- Eli has far more experience with that. Arabic I was justreading last week (not natively; I use a translator). This week I was reading alot of Turkish. In all cases I was looking at text prepared by others. In allcases my sources used UTF-8 -- not due to my influence, but because that'swhat's typical these days.

In my previous job I routinely had to deal with CJK text, and did so with lotsof different encodings, including monstrosities such as DBCS-Host that Emacsdoesn't even support. So my experience is reasonably good in this area --better than the average random hacker anyway. If you go back 20 years,non-UTF-8 encodings such as Shift-JIS and EUC were by far the most popular inJapan. Nowadays? Sure, Shift-JIS and EUC are still used, but they're goingdownhill. Of the top 20 web sites in Japan (according to Alexa), 18 use UTF-8,one uses Shift-JIS, and one uses EUC on their home pages. In the w3techs surveyof world web sites, 85% use UTF-8; the second most-popular encoding, ISO-8859-1,is at only 7.5%, and it's that high only because the old HTML standard madeISO-8859-1 the default.

So in practice, defaulting to UTF-8 is quite a good choice nowadays. Of courseif we can get the proper encoding from the document or its envelope we shouldprefer that, and that should let us deal with web documents and email.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Emacs-diffs] master db828f6: Don't rely on defaults in decoding UTF-8 encoded Lisp files, (continued)

Prev by Date: Re: [Emacs-diffs] master db828f6: Don't rely on defaults in decoding UTF-8 encoded Lisp files
Next by Date: Re: [Emacs-diffs] master db828f6: Don't rely on defaults in decoding UTF-8 encoded Lisp files
Previous by thread: Re: [Emacs-diffs] master db828f6: Don't rely on defaults in decoding UTF-8 encoded Lisp files
Next by thread: Re: [Emacs-diffs] master db828f6: Don't rely on defaults in decoding UTF-8 encoded Lisp files
Index(es):
- Date
- Thread