[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#12291: [rev 109796] wrong UTF-8 handling

From: Werner LEMBERG
Subject: bug#12291: [rev 109796] wrong UTF-8 handling
Date: Tue, 28 Aug 2012 07:47:20 +0200 (CEST)

[bzr revision 109796]

Have a look at the attached file, containing a single character.
(It's transmitted as binary to avoid e-mail encoding issues).  It
contains a single, four-byte UTF-8 encoded character (0xF4 0xB5 0x87
0x9E, which would map to the non-existent Unicode character code
U+1351DE).  If I load this file as UTF-8 encoded, Emacs gives this as
the output of `C-u C-x =':

               position: 1 of 2 (0%), column: 0
              character: 二 (displayed as 二) (codepoint 20108, #o47214, #x4e8c)
      preferred charset: unicode (Unicode (ISO10646))
  code point in charset: 0x4E8C
                 syntax: w      which means: word
               category: .:Base, C:2-byte han, L:Left-to-right (strong), 
c:Chinese, h:Korean, j:Japanese, |:line breakable
               to input: type "C-x 8 RET HEX-CODEPOINT" or "C-x 8 RET NAME"
            buffer code: #xE4 #xBA #x8C
              file code: #xE4 #xBA #x8C (encoded by coding system utf-8-unix)
                display: by this font (glyph code)
      xft:-unknown-SimSun-normal-normal-normal-*-24-*-*-*-d-0-iso10646-1 (#x460)

  Character code properties: customize what to show
    name: CJK IDEOGRAPH-4E8C
    general-category: Lo (Letter, Other)
    decomposition: (20108) ('二')

Look what Emacs says about the file code.  If I save this
one-character file as UTF-8, the character code stays as-is.

This behaviour is clearly wrong.  I suspect that Emacs is using such a
high character code for internal representation of the `emacs-mule'
encoding.  However, the user must not see this.  Instead, such
characters must be converted to correct UTF-8.



In GNU Emacs (i686-pc-linux-gnu, GTK+ Version 2.24.9)
 of 2012-08-28 on linux-nvf0
Windowing system distributor `The X.Org Foundation', version 11.0.11004000
Configured using:
 `configure 'MAKEINFO=/usr/bin/makeinfo' '--with-x-toolkit=gtk''

Important settings:
  value of $LANG: de_DE.UTF-8
  value of $XMODIFIERS: @im=none
  locale-coding-system: utf-8-unix
  default enable-multibyte-characters: t

Major mode: Summary

Minor modes in effect:
  tooltip-mode: t
  mouse-wheel-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  column-number-mode: t
  transient-mark-mode: t

Recent input:
<return> w b u g - e m <tab> <tab> <tab> <tab> <tab> 
<tab> <tab> <backspace> <backspace> <tab> <tab> C-c 
C-q y M-x w r i t e - e m <tab> C-g C-h a b u g <return> 
<M-next> C-x 1 M-x r e p r t <backspace> <backspace> 
o r t - e m <tab> <return>

Recent messages:
Saving file /home/wl/Mail/draft/11...
Wrote /home/wl/Mail/draft/11
Draft is prepared
No matching alias [7 times]
Kill draft message? (y or n)  y
Saving file /home/wl/Mail/draft/11...
Wrote /home/wl/Mail/draft/11
Draft was killed
Type C-x 4 C-o RET to restore the other window.  

Load-path shadows:
None found.

(shadow emacsbug message format-spec rfc822 mml mml-sec mm-decode
mm-bodies mm-encode mail-parse rfc2231 mailabbrev gmm-utils mailheader
sendmail rfc2047 rfc2045 ietf-drums mm-util mail-prsvr mail-utils
apropos descr-text latexenc preview prv-emacs byte-opt tex-buf
noutline outline font-latex warnings bytecomp byte-compile cconv
macroexp latex easy-mmode edmacro kmacro tex-style cus-edit wid-edit
cus-start cus-load pp mew-varsx mew-unix cal-menu calendar
cal-loaddefs mew-auth mew-config mew-imap2 mew-imap mew-nntp2 mew-nntp
mew-pop mew-smtp mew-ssl mew-ssh mew-net mew-highlight mew-sort
mew-fib mew-ext mew-refile mew-demo mew-attach mew-draft mew-message
mew-thread mew-virtual mew-summary4 mew-summary3 mew-summary2
mew-summary mew-search mew-pick mew-passwd mew-scan mew-syntax mew-bq
mew-smime mew-pgp mew-header mew-exec mew-mark mew-mime mew-edit
mew-decode mew-encode mew-cache mew-minibuf mew-complete mew-addrbook
mew-local mew-vars3 mew-vars2 mew-vars mew-env mew-mule3 mew-mule
mew-gemacs mew-key mew-func mew-blvs mew-const mew tex advice help-fns
advice-preload tex-site auto-loads quail help-mode easymenu cjktilde
disp-table time-date tooltip ediff-hook vc-hooks lisp-float-type
mwheel x-win x-dnd tool-bar dnd fontset image regexp-opt fringe
tabulated-list newcomment lisp-mode register page menu-bar rfn-eshadow
timer select scroll-bar mouse jit-lock font-lock syntax facemenu
font-core frame cham georgian utf-8-lang misc-lang vietnamese tibetan
thai tai-viet lao korean japanese hebrew greek romanian slovak czech
european ethiopic indian cyrillic chinese case-table epa-hook
jka-cmpr-hook help simple abbrev minibuffer loaddefs button faces
cus-face files text-properties overlay sha1 md5 base64 format env
code-pages mule custom widget hashtable-print-readable backquote
make-network-process dbusbind dynamic-setting system-font-setting
font-render-setting move-toolbar gtk x-toolkit x multi-tty emacs)

Attachment: emacs-problem.utf8
Description: Binary data

reply via email to

[Prev in Thread] Current Thread [Next in Thread]