[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Broken `if big5-p` code in titdic-cnv.el (was: Scan of broken conditiona

From: Stefan Monnier
Subject: Broken `if big5-p` code in titdic-cnv.el (was: Scan of broken conditional forms)
Date: Tue, 26 Jan 2021 22:02:35 -0500
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux)

[ This dates back to Jan 2020.

  To recap we have in titdic-cnv.el code like:

    (defun tsang-quick-converter (dicbuf tsang-p big5-p)
      (let ((fulltitle (if tsang-p (if big5-p "倉頡" "倉頡")
                         (if big5-p "簡易" "簡易")))

  where the `if big5-p` tests appear to do nothing.  It turns out that
  those two strings have the same unicode chars but because the file is
  encoded using iso-2022-jp they have a different `charset` property
  applied to them which Emacs can use to render them differently.
  When we bumped into this code, tho, the file has been converted to
  `utf-8` (by yours truly) so that the nuance had been lost.
  Paul reverted this part of my change to recover the subtle rendering.  ]

Paul Eggert [2020-01-05 12:48:29] wrote:
> On 1/5/20 7:45 AM, Eli Zaretskii wrote:
>> let's install this only on master, please.
> OK, I did that.
>> Btw, the change in titdic-cnv.el, by itself, makes no sense ...
>> When Stefan recoded this file, he left that code intact, which now
>> makes no sense at all.  We should probably propertize the strings with
>> the 2 corresponding charset properties, something that before the
>> recoding happened automagically (because ISO-2022 records the charset
>> in the encoding), and which was the whole purpose of this function.
> I worked around the problem by converting titdic-cnv.el back to
> iso-2022-7bit on master, as this conversion was simple.  Stefan (or
> anybody) can look into this later if they want to do it in
> a better way.

I just looked into it and I still can't see what's wrong with using
utf-8 here.  AFAICT those `if big5-p` tests have been doing nothing ever
since Emacs's internal encoding was changed to be based on Unicode
(i.e. Emacs-23).

While it's true that using the iso-2022-jp encoding on the file does
allow Emacs to render the two strings differently, this only applies to
the source file.  The .elc files all use `utf-8-emacs` encoding anyway,
so that info is lost.  And the difference is even lost before we write
the .elc file because when Emacs byte-compiles that code the
byte-compiler considers those two strings as "equal" and emits only one
string in the byte-code (so the two branches return `eq` strings).

So, I think using `iso-2022-jp` is a bad idea here: it gives the
illusion that the two branches are different where they really aren't.
If we do want to recover the difference (the one we presumably lost in
Emacs-23), we need to make those two branches return
properly-propertized strings with something like:

    (defun tsang-quick-converter (dicbuf tsang-p big5-p)
      (let* ((charset (if big5-p 'chinese-big5-1 'chinese-cns11643-1))
              (propertize (if tsang-p "倉頡" "簡易")
                          'charset charset))

Tho I'm not sure even that would be sufficient, since that function
generates a file so if it just prints those strings into an Elisp file,
the info would again be lost, at least when that Elisp file
gets compiled.

Given that we lived blissfully unaware of the problem for the last 10
years (plus another year with some vague awareness of it but still
without doing anything about it), I suggest we get rid of the `if
big5-p` tests and switch the file to `utf-8`.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]