[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Emacs-diffs] master db828f6: Don't rely on defaults in decoding UTF

From: Paul Eggert
Subject: Re: [Emacs-diffs] master db828f6: Don't rely on defaults in decoding UTF-8 encoded Lisp files
Date: Sun, 27 Sep 2015 01:34:39 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0

address@hidden wrote:
Perhaps most
recently authored pages are UTF-8.  But the data sets themselves are
typically flat files, either CSV or plaintext.  The explanatory pages,
even if in HTML, often haven't been revised in decades.

Yes, that's pretty much my experience. In Japan older stuff is mostly Shift-JIS, EUC, or maybe ISO-2022-JP. New stuff is mostly UTF-8. People using old email software send old encodings because that's what they've been doing for decades. Normally it works, because the email envelope tells you the encoding. But sometimes people screw up and you get mojibake.

But this situation is not an argument for having the locale determine encoding when visiting random imported files that lack envelopes. For such files, it often doesn't work to set LC_ALL=ja_JP.ujis and expect Emacs to get things right. (This is one of things that Eli has noted multiple times, and he's right.)

Of course if one is working in a conservative Japanese government ministry that standardized on Shift-JIS back in 1992 and hasn't changed since then, then things are different, and Emacs should support such users. But typical Emacs users are not in this situation, and the Emacs default should cater to the more-typical case today.

To narrow things down a bit I briefly looked for .jp websites that talk about Emacs. Google reported the following first page's worth of hits (I list year of composition, encoding, and URL). Again, the new stuff is mostly UTF-8, and the old stuff is a mishmash, so it's another data point suggesting that defaulting to UTF-8 would not be such a bad thing for editing today's text.

2002 Shift-JIS   http://www.rsch.tuis.ac.jp/~ohmi/literacy/emacs/quick.html
2008 ISO-2022-JP http://www.wakayama-u.ac.jp/~takehiko/webprg/03.html
2015 EUC-JP      http://d.hatena.ne.jp/tarao/20150221/1424518030
2015 UTF-8       http://uguisu.skr.jp/Windows/emacs.html
2015 UTF-8 http://www.amazon.co.jp/Emacs%E5%AE%9F%E8%B7%B5%E5%85%A5%E9%96%80-%EF%BD%9E%E6%80%9D%E8%80%83%E3%82%92%E7%9B%B4%E6%84%9F%E7%9A%84%E3%81%AB%E3%82%B3%E3%83%BC%E3%83%89%E5%8C%96%E3%81%97%E3%80%81%E9%96%8B%E7%99%BA%E3%82%92%E5%8A%A0%E9%80%9F%E3%81%99%E3%82%8B-WEB-DB-PRESS-plus/dp/4774150029
2015 UTF-8       http://www.sigasi.jp/better-emacs-vhdl-mode
2006 Shift-JIS http://www.math.kobe-u.ac.jp/icms2006/icms2006-video/slides/grayson/share/doc/Macaulay2/Macaulay2/html/_teaching_spemacs_sphow_spto_spfind_sp__M2.html
2015 UTF-8       https://osdn.jp/projects/gnupack/

reply via email to

[Prev in Thread] Current Thread [Next in Thread]