emacs-pretest-bug
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] Unicode Lisp reader escapes.


From: Aidan Kehoe
Subject: Re: [PATCH] Unicode Lisp reader escapes.
Date: Thu, 15 Jun 2006 20:38:06 +0200

 >      if (EQ(Qnil, lisp_char))
 >        {
 >          /* This is ugly and horrible and trashes the user's data. */
 >          XSETFASTINT (i, MAKE_CHAR (charset_katakana_jisx0201, 
 >                                     34 + 128, 46 + 128));
 >             return i;
 >        }
 > 
 > What is this special Katakana character, and why are we producing it?

Firstly, thank you for posing the question; the character intended was not a
member of JISX0201 at all, rather of JISX0208. I yanked the wrong charset
identifier from charset.h when porting the code from XEmacs. The patch below
addresses this. 

(make-char 'japanese-jisx0208 34 46) gives U+3013 GETA MARK, a character in
JISX 0208 that is used to represent unknown or corrupted data. The
Unicode-specific equivalent is U+FFFD REPLACEMENT CHARACTER. I used the GETA
MARK because I was certain it would be available in Mule and it is
equivalent. It turns out that (make-char 'mule-unicode-e000-ffff 117 61)
gives U+FFFD, so it might be worthwhile to replace that. 

 > Is it to trigger an "Invalid character" message, or is something else
 > going on here?

It doesn’t actually trigger a message, it displays a character to be
interpreted as “the character couldn’t be interpreted.”

My feeling is that the syntax should be close in its behaviour to what the
coding systems do, and when the coding systems see a code point that is
valid but that they can’t interpret, they trash the user’s data. (Or do
something totally mad like transform invalid UTF-16 to invalid UTF-8!?)

src/ChangeLog addition:

2006-06-14  Aidan Kehoe  <address@hidden>

        * lread.c (read_escape):
        Change charset_katakana_jisx0201 to charset_jisx0208 as it should
        have been in the first place, since we intended U+3013 GETA MARK. 
        

GNU Emacs Trunk source patch:
Diff command:   cvs -q diff -u
Files affected: src/lread.c

Index: src/lread.c
===================================================================
RCS file: /sources/emacs/emacs/src/lread.c,v
retrieving revision 1.353
diff -u -u -r1.353 lread.c
--- src/lread.c 9 Jun 2006 18:22:30 -0000       1.353
+++ src/lread.c 14 Jun 2006 06:57:49 -0000
@@ -1967,7 +1967,7 @@
        if (EQ(Qnil, lisp_char))
          {
            /* This is ugly and horrible and trashes the user's data.  */
-           XSETFASTINT (i, MAKE_CHAR (charset_katakana_jisx0201,
+           XSETFASTINT (i, MAKE_CHAR (charset_jisx0208,
                                       34 + 128, 46 + 128));
             return i;
          }


-- 
Aidan Kehoe, http://www.parhasard.net/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]