[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: UTF-8 paste from xterm picks Chinese charset

From: Kenichi Handa
Subject: Re: UTF-8 paste from xterm picks Chinese charset
Date: Tue, 06 Mar 2007 15:24:04 +0900
User-agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/22.0.95 (i686-pc-linux-gnu) MULE/5.0 (SAKAKI)

Sorry for the late response on this matter.

In article <address@hidden>, Martins Krikis <address@hidden> writes:

> Upon testing the new Emacs behavior on Latvian characters encoded in UTF-8,
> I noticed that pasting them out of Emacs and into, say, xterm works.  However,
> pasting them back does not quite work---all the lowercase vowels with macrons
> get understood as Chinese characters and lose their previous looks. These are
> the offending characters: "āēīōū" (UTF-8 encoding 0xc481, 0xc493, 0xc4ab,
> 0xc58d, 0xc5ab). Saving the text encodes them in UTF-8 again, so the damage
> is limited, but working with such text is still a torture.

That is because your xterm (or X library) sends them encoded
in Chinese (or Japanese) character when COMPOUND_TEXT is
requested from Emacs.  It itself is not a bug, but a bad
feature.  I remember that some version of xterm (or X
library) uses "UTF-8 extended segments" to embded Unicode
characters in COMPOUND_TEXT in such a case.  But it seems
that that is not true in their latest versions.  :-(

Anyway, I've just improved the function
x-select-utf8-or-ctext to prefer UTF-8 in such a case.
Please try with the latest CVS code.

> I tried setting the coding-system for X selection to
> utf-8, but then pasting produces complete gibberish. (And
> I'd say that's a different bug!) Changing language
> environments does not seem to have any effect on either of
> these bugs (tried Latvian, English, UTF-8).

It's not a bug.  Setting selection-coding-system just
changes a way how to decode a selection data, it doesn't
change which data-type (UTF8_STRING, COMPOUND_TEXT, or just
STRING) to request.  The latter is controlled by the
variable x-select-request-type.  I've just added more words
in the documentation of selection-coding-system.

> I've turned the utf-translate-cjk-mode off but this does not
> improve things, contrary to the very promising sounding help-text about it.
> (Not a word about it in info pages, BTW, that's another wishlist item.)

Which part makes you think so?  It also doesn't affect which
data-type to request.  Anyway, it's bad that
utf-translate-cjk-mode is not in Info.  Could someone put it
in Info?  I'm not good at writing Info.

Kenichi Handa

reply via email to

[Prev in Thread] Current Thread [Next in Thread]