[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

UTF-8 paste from xterm picks Chinese charset

From: Martins Krikis
Subject: UTF-8 paste from xterm picks Chinese charset
Date: Fri, 23 Feb 2007 22:08:51 +0200

(Please disregard the previous report that got messed up
due to choosing charset mule-utf16-be or something like that.)

Upon testing the new Emacs behavior on Latvian characters encoded in UTF-8,
I noticed that pasting them out of Emacs and into, say, xterm works.  However,
pasting them back does not quite work---all the lowercase vowels with macrons
get understood as Chinese characters and lose their previous looks. These are
the offending characters: "āēīōū" (UTF-8 encoding 0xc481, 0xc493, 0xc4ab,
0xc58d, 0xc5ab). Saving the text encodes them in UTF-8 again, so the damage
is limited, but working with such text is still a torture. I tried setting
the coding-system for X selection to utf-8, but then pasting produces complete
gibberish. (And I'd say that's a different bug!) Changing language environments
does not seem to have any effect on either of these bugs (tried Latvian,
English, UTF-8). 

I've turned the utf-translate-cjk-mode off but this does not
improve things, contrary to the very promising sounding help-text about it.
(Not a word about it in info pages, BTW, that's another wishlist item.)

This is how Emacs describes one of the pasted characters:
  character: � (37921, #o112041, #x9421, U+0101)
    charset: chinese-gb2312 (GB2312 Chinese simplified: ISO-IR-58.)
 code point: #x28 #x21
     syntax: w  which means: word
   category: c:Chinese |:While filling, we can break a line at this character.
buffer code: #x91 #xA8 #xA1
  file code: #xC4 #x81 (encoded by coding system mule-utf-8-unix)
    display: by this font (glyph code)
     -ISAS-Fangsong ti-Medium-R-Normal--16-160-72-72-c-160-GB2312.1980-0 

It should have been interpreted as follows, however:
  character: ā (331809, #o1210041, #x51021, U+0101)
    charset: mule-unicode-0100-24ff
             (Unicode characters of the range U+0100..U+24FF.)
 code point: #x20 #x21
     syntax: w  which means: word
   category: l:Latin
buffer code: #x9C #xF4 #xA0 #xA1
  file code: #xC4 #x81 (encoded by coding system mule-utf-8-unix)
    display: by this font (glyph code)
     -Misc-Fixed-Medium-R-SemiCondensed--13-120-75-75-C-60-ISO10646-1 (#x101)

That is about all I can say, despite having read all the info pages about
International character set support. I don't see a way to examine what
coding system is currently in effect for X selections (a variable that
can be queried would be nice), and don't have a clue about how charsets
correspond to coding systems. In fact, charsets seem to be really neglected
in the info pages and in user's ability to influence how a particular coding
system uses them.

If I'm wrong about any of this, I'll be very happy to learn how.

Thank you,

  Martins Krikis

In GNU Emacs (i686-pc-linux-gnu, X toolkit, Xaw3d scroll bars)
 of 2007-02-19 on mkbox
X server distributor `The X.Org Foundation', version 11.0.70101000
Important settings:
  value of $LC_ALL: nil
  value of $LC_COLLATE: nil
  value of $LC_CTYPE: nil
  value of $LC_MESSAGES: nil
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: nil
  value of $LANG: lv_LV.UTF-8
  locale-coding-system: utf-8
  default-enable-multibyte-characters: t

Major mode: Lisp Interaction

Minor modes in effect:
  show-paren-mode: t
  tooltip-mode: t
  mouse-wheel-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  unify-8859-on-encoding-mode: t
  auto-compression-mode: t
  column-number-mode: t
  line-number-mode: t
  transient-mark-mode: t

Recent input:
M-x r e <backspace> <backspace> <backspace> <backspace> 
<down-mouse-1> <mouse-movement> <mouse-1> <down-mouse-2> 
<mouse-2> M-x r e p o <tab> <backspace> <backspace> 
<backspace> <backspace> M-x r e o p <tab> <backspace> 
<backspace> p o <tab> r <tab> <return>

Recent messages:
Hey, Dude!
Loading /home/martins/Lisp/latvian-utf8-apo-postfix.el (source)...done
Loading /home/martins/.javascript.el (source)...done
Loading paren...done
Loading smtpmail...done
For information about the GNU Project and its goals, type C-h C-p.
call-interactively: Text is read-only [2 times]
call-interactively: Command attempted to use minibuffer while in minibuffer
Making completion list...
Loading emacsbug...done

reply via email to

[Prev in Thread] Current Thread [Next in Thread]