emacs-pretest-bug
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: can't paste non-Latin-1 text to Emacs 21.2


From: Kenichi Handa
Subject: Re: can't paste non-Latin-1 text to Emacs 21.2
Date: Tue, 6 Apr 2004 21:56:19 +0900 (JST)
User-agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.3 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI)

In article <address@hidden>, Dave Love <address@hidden> writes:
>>  This behaviour is controlled by the
>>  function ctext-non-standard-encodings-table.

> This doesn't seem to be in NEWS.

I don't mean that behaviour is customizable.  That function
is just a helper function for ctext-pre-write-conversion,
and I showed it so that you can see what's going on by
reading the code.

> Shouldn't it be controlled by a variable (user option)?

That's better, but, for the moment, I don't have a time to
design it.  If someone gives me a precise design, I'll
implement it.

> The correct thing to do with text you can't
> encode with standard ISO2022 charsets appears to be to use an extended
> segment labelled as, say, utf-8.  That's correct and unambiguous as
> long as you use IANA names.  I did once at least start to implement
> that, but I don't remember if I finished.

That's correct but it seems that no other client can decode
it.  Do you know any program that implements it?

>>  Another idea is to encode such characters to some of legacy
>>  charsets that are listed as "Approved Standard Encoding".

> I don't think you should restrict it to the explicit list, if that's
> what you mean.  It seems fairly clear that ISO standard charsets
> should get normal ISO2022 encoding in CTEXT.  (I couldn't find a
> current address for Scheifler to check the unsupported assertion
> that's wrong.)  I'm sure Emacs should try to translate characters from
> private charsets to standard ones for ctext unless it can tell that
> the selection is for another Emacs client.

I've just found this code in xc/lib/X11/lcCT.c in a
distribution from X.org and XFree86.

(1) X.org version

static CTDataRec default_ct_data[] =
{
    { "ISO8859-1:GL", "\033(B" },
    { "ISO8859-1:GR", "\033-A" },
    { "ISO8859-2:GR", "\033-B" },
    { "ISO8859-3:GR", "\033-C" },
    { "ISO8859-4:GR", "\033-D" },
    { "ISO8859-7:GR", "\033-F" },
    { "ISO8859-6:GR", "\033-G" },
    { "ISO8859-8:GR", "\033-H" },
    { "ISO8859-5:GR", "\033-L" },
    { "ISO8859-9:GR", "\033-M" },
    { "ISO8859-10:GR", "\033-V" },
    { "JISX0201.1976-0:GL", "\033(J" },
    { "JISX0201.1976-0:GR", "\033)I" },

    { "GB2312.1980-0:GL", "\033$(A" },
    { "GB2312.1980-0:GR", "\033$)A" },
    { "JISX0208.1983-0:GL", "\033$(B" },
    { "JISX0208.1983-0:GR", "\033$)B" },
    { "KSC5601.1987-0:GL", "\033$(C" },
    { "KSC5601.1987-0:GR", "\033$)C" },
#ifdef notdef
    { "JISX0212.1990-0:GL", "\033$(D" },
    { "JISX0212.1990-0:GR", "\033$)D" },
    { "CNS11643.1986-1:GL", "\033$(G" },
    { "CNS11643.1986-1:GR", "\033$)G" },
    { "CNS11643.1986-2:GL", "\033$(H" },
    { "CNS11643.1986-2:GR", "\033$)H" },
#endif
    { "TIS620.2533-1:GR", "\033-T"},
    { "ISO10646-1", "\033%B"},
    /* Non-Standard Character Set Encodings */
    { "KOI8-R:GR", "\033%/1\200\210koi8-r\002"},
    { "FCD8859-15:GR", "\033%/1\200\213fcd8859-15\002"},
} ; 

(2) XFree86 version

static CTDataRec default_ct_data[] =
{
    /*                                                                    */
    /* X11 registry name       MIME name         ISO-IR      ESC sequence */
    /*                                                                    */

    /* Registered character sets with one byte per character */
    { "ISO8859-1:GL",       /* US-ASCII              6   */  "\033(B" },
    { "ISO8859-1:GR",       /* ISO-8859-1          100   */  "\033-A" },
    { "ISO8859-2:GR",       /* ISO-8859-2          101   */  "\033-B" },
    { "ISO8859-3:GR",       /* ISO-8859-3          109   */  "\033-C" },
    { "ISO8859-4:GR",       /* ISO-8859-4          110   */  "\033-D" },
    { "ISO8859-5:GR",       /* ISO-8859-5          144   */  "\033-L" },
    { "ISO8859-6:GR",       /* ISO-8859-6          127   */  "\033-G" },
    { "ISO8859-7:GR",       /* ISO-8859-7          126   */  "\033-F" },
    { "ISO8859-8:GR",       /* ISO-8859-8          138   */  "\033-H" },
    { "ISO8859-9:GR",       /* ISO-8859-9          148   */  "\033-M" },
    { "ISO8859-10:GR",      /* ISO-8859-10         157   */  "\033-V" },
    { "ISO8859-13:GR",      /* ISO-8859-13         179   */  "\033-Y" },
    { "ISO8859-14:GR",      /* ISO-8859-14         199   */  "\033-_" },
    { "ISO8859-15:GR",      /* ISO-8859-15         203   */  "\033-b" },
    { "ISO8859-16:GR",      /* ISO-8859-16         226   */  "\033-f" },
    { "JISX0201.1976-0:GL", /* ISO-646-JP           14   */  "\033(J" },
    { "JISX0201.1976-0:GR",                                  "\033)I" },
    { "TIS620-0:GR",        /* TIS-620             166   */  "\033-T" },

    /* Registered character sets with two byte per character */
    { "GB2312.1980-0:GL",   /* GB_2312-80           58   */ "\033$(A" },
    { "GB2312.1980-0:GR",   /* GB_2312-80           58   */ "\033$)A" },
    { "JISX0208.1983-0:GL", /* JIS_X0208-1983       87   */ "\033$(B" },
    { "JISX0208.1983-0:GR", /* JIS_X0208-1983       87   */ "\033$)B" },
    { "JISX0208.1990-0:GL", /* JIS_X0208-1990      168   */ "\033$(B" },
    { "JISX0208.1990-0:GR", /* JIS_X0208-1990      168   */ "\033$)B" },
    { "JISX0212.1990-0:GL", /* JIS_X0212-1990      159   */ "\033$(D" },
    { "JISX0212.1990-0:GR", /* JIS_X0212-1990      159   */ "\033$)D" },
    { "KSC5601.1987-0:GL",  /* KS_C_5601-1987      149   */ "\033$(C" },
    { "KSC5601.1987-0:GR",  /* KS_C_5601-1987      149   */ "\033$)C" },
    { "CNS11643.1986-1:GL", /* CNS 11643-1992 pl.1 171   */ "\033$(G" },
    { "CNS11643.1986-1:GR", /* CNS 11643-1992 pl.1 171   */ "\033$)G" },
    { "CNS11643.1986-2:GL", /* CNS 11643-1992 pl.2 172   */ "\033$(H" },
    { "CNS11643.1986-2:GR", /* CNS 11643-1992 pl.2 172   */ "\033$)H" },
    { "CNS11643.1992-3:GL", /* CNS 11643-1992 pl.3 183   */ "\033$(I" },
    { "CNS11643.1992-3:GR", /* CNS 11643-1992 pl.3 183   */ "\033$)I" },
    { "CNS11643.1992-4:GL", /* CNS 11643-1992 pl.4 184   */ "\033$(J" },
    { "CNS11643.1992-4:GR", /* CNS 11643-1992 pl.4 184   */ "\033$)J" },
    { "CNS11643.1992-5:GL", /* CNS 11643-1992 pl.5 185   */ "\033$(K" },
    { "CNS11643.1992-5:GR", /* CNS 11643-1992 pl.5 185   */ "\033$)K" },
    { "CNS11643.1992-6:GL", /* CNS 11643-1992 pl.6 186   */ "\033$(L" },
    { "CNS11643.1992-6:GR", /* CNS 11643-1992 pl.6 186   */ "\033$)L" },
    { "CNS11643.1992-7:GL", /* CNS 11643-1992 pl.7 187   */ "\033$(M" },
    { "CNS11643.1992-7:GR", /* CNS 11643-1992 pl.7 187   */ "\033$)M" },

    /* Registered encodings with a varying number of bytes per character */
    { "ISO10646-1",         /* UTF-8               196   */ "\033%G"  },

    /* Encodings without ISO-IR assigned escape sequence must be
       defined in XLC_LOCALE files, using "\033%/1" or "\033%/2". */

    /* Backward compatibility with XFree86 3.x */
    { "ISO8859-14:GR",                                      "\033%/1" },
    { "ISO8859-15:GR",                                      "\033%/1" },
    /* used by Emacs, but not backed by ISO-IR */
    { "BIG5-0:GL", "\033$(0" },
    { "BIG5-0:GR", "\033$)0" },
    { "BIG5-1:GL", "\033$(1" },
    { "BIG5-1:GR", "\033$)1" },

};

BUT, it seems that actually used extended segment can be
freely defined in a locale data (i.e. a file XLC_LOCALE) of
each locale.  For instance,
/usr/X11R6/lib/X11/locale/georgian-academy/XLC_LOCALE,
contains this code:

XLC_CHARSET_DEFINE
csd0    {
        charset_name    GEORGIAN-ACADEMY
        side            GR
        length          1
        string_encoding False
        sequence        \x1b%/1
}
END XLC_CHARSET_DEFINE

So, in this locale, the charset GEORGIAN-ACADEMY is encoded
by using extended segment "ESC % / 1 M L GEORGIAN-ACADEMY ...".

Perhaps, each lang. env. should have ctext-encoding-list
(instead of the current ctext-non-standard-encodings) that
reflect all charsets defined in XLC_LOCALE of the
corresponding locale, and force using it in ctext encoding.
And, for a character not encodable by such encodings, it's
almost useless to struggle to find a correcnt encoding.

---
Ken'ichi HANDA
address@hidden




reply via email to

[Prev in Thread] Current Thread [Next in Thread]