[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#4051: Character Soup

From: Juri Linkov
Subject: bug#4051: Character Soup
Date: Thu, 06 Aug 2009 00:09:20 +0300
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.1.50 (x86_64-pc-linux-gnu)

The coding system for the buffer with the Latin-1 character á in
the Cyrillic KOI8 language environment is detected as Chinese gb2312.
How funny!

I noticed this while reporting the bug#4037 that was sent by message.el
with charset=gb2312.  Mail readers incorrectly display this message due
to ugly fonts associated with gb2312 (this is a separate problem).

I think it would be more natural to encode this as Latin-1 (in this
particular case) or generally UTF-8 - the universal coding specially
designed for mixing different scripts.

The easiest way to reproduce this problem:

  1. emacs -Q
  2. C-x RET l Cyrillic-KOI8
  3. C-x 8 ' a
  4. C-x C-s
  5. File to save in: /tmp/file

After that the prompt says:

  Select coding system (default chinese-iso-8bit): 

and the buffer `*Warning*' contains:

  These default coding systems were tried to encode text
  in the buffer `file':
    (cyrillic-koi8-unix (192 . 225))
  However, each of them encountered characters it couldn't encode:
    cyrillic-koi8-unix cannot encode these: á

  Click on a character (or switch to this window by `C-x o'
  and select the characters by RET) to jump to the place it appears,
  where `C-u C-x =' will give information about it.

  Select one of the safe coding systems listed below,
  or cancel the writing with C-g and edit the buffer
     to remove or modify the problematic characters,
  or specify any other coding system (and risk losing
     the problematic characters).

    gb2312 utf-8 euc-jis-2004 euc-jp windows-1258 viscii
    iso-2022-jp-2004 cp862 iso-8859-16 hp-roman8 next mac-roman cp437
    cp865 cp861 cp860 cp858 cp857 cp852 cp850 windows-1254 windows-1252
    windows-1250 iso-8859-15 iso-8859-14 iso-8859-10 iso-8859-9
    iso-8859-4 iso-8859-3 iso-8859-2 gb18030 gbk hz-gb-2312 utf-7
    iso-8859-1 utf-16 utf-16be-with-signature utf-16le-with-signature
    utf-16be utf-16le iso-2022-7bit utf-8-auto utf-8-with-signature
    eucjp-ms vietnamese-tcvn vietnamese-viqr vietnamese-vscii
    japanese-shift-jis-2004 japanese-iso-7bit-1978-irv ibm1047
    utf-7-imap utf-8-emacs

I already figured out how to fix this problem for message.el using
(setq mm-coding-system-priorities (cons 'utf-8 mm-coding-system-priorities))
But as shown by the test case above this is a general problem.

Juri Linkov

reply via email to

[Prev in Thread] Current Thread [Next in Thread]