bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#33796: 27.0.50; Use utf-8 is all our Elisp files


From: Paul Eggert
Subject: bug#33796: 27.0.50; Use utf-8 is all our Elisp files
Date: Wed, 19 Dec 2018 09:54:40 -0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.3.1

> I'm not really sure who to ask about this.

You can ask me (:-). Although I can't read east-Asian languages I do have significant experience with CJK text as my previous (15-year) job was in a company whose customers were almost all CJK and where CJK internationalization was essential and where I regularly dealt with weird encodings and displays. And this one is an easy call: for maintaining these particular files, UTF-8 is an improvement and this patch should go in.

To take just one example, titdic-cnv.el: people who are seriously maintaining it and who need to read the Chinese text will almost surely have their environment set up to display UTF-8 Chinese text well already. Furthermore, if you take a look at all the changes made to this file in the last decade, here are the statistics:

  edits contributor
     15 Author: Paul Eggert <eggert@cs.ucla.edu>
     10 Author: Glenn Morris <rgm@gnu.org>
      2 Author: Stefan Monnier <monnier@iro.umontreal.ca>
      2 Author: Juanma Barranquero <lekktu@gmail.com>
      1 Author: Phillip Lord <phillip.lord@russet.org.uk>
      1 Author: Kenichi Handa <handa@m17n.org>
      1 Author: Andreas Schwab <schwab@linux-m68k.org>

Only one edit was made by a CJK user, and handa's edit involved only ASCII characters. Switching this file to UTF-8 would not have made any of our maintenance any more difficult in the last decade.

Conversely, I commonly use tools like 'git grep' to look for issues in the code, and these tools mishandle non-UTF-8 files and I see mojibake on my screen because of this. So it will be a significant win for me (and I suspect others) when we switch these files to UTF-8.

To try to answer Stefan's questions:

> - Do those people who edit those files really care about the difference?

No, almost always: see above.

>   utf-8 is becoming standard even in the CJK world so
>   maybe the change is not that terrible (or at least, users have gotten
>   used to lowering their expectations in this respect).

Yes, that’s happened. I looked for recent reports about this, and it appears that the controversy is mostly over. For example, <https://gihyo.jp/lifestyle/serial/01/ganshiki-soushi/0069> (dated 2015) lamented the demise of Japanese Knoppix and said that Plamo Linux had problems with EUC-JP and suggested users switch to UTF-8. More recently <https://qiita.com/tenforward/items/5e353f290f0b401139cb> (dated this year) says that the choice of EUC-JP or UTF-8 is user-specific for Plamo Linux, and that applications like Firefox have problems with EUC-JP so discretion is advised if you choose EUC-JP. If even hardcore holdouts like Plamo are folding....

> - If the change is indeed problematic, can we adjust it by using
>   a file-global language tag?

I hope that’s not necessary, but it’d be OK if we have to do it.

> - If that's not sufficient, can we use a scheme like that
>   of etc/HELLO but to keep the files directly usable as Elisp (so as to
>   have our cake and eat it too)?

etc/HELLO is pretty much a disaster for me now, as I can’t use any tool other than Emacs to look at it, and even Emacs screws up if I do something like 'M-x grep RET hello etc/HELLO RET'. I’d rather not extend this disaster to other files.

PS. One minor suggestion for your patch: please also update the list of files in admin/notes/unicode to remove mention of the files in question.

PPS. How about also converting etc/tutorials/TUTORIAL.ja, lisp/leim/quail/hanja-jis.el, lisp/leim/quail/japanese.el, lisp/leim/quail/py-punct.el, and lisp/leim/quail/pypunct-b5.el?






reply via email to

[Prev in Thread] Current Thread [Next in Thread]