Re: cp949 support

From: Kenichi Handa
Subject: Re: cp949 support
Date: Thu, 18 Jun 2009 20:14:44 +0900

In article <address@hidden>, Jihyun Cho <address@hidden> writes:

> I ran the following test.
> First, I applied a old patch.
> I saved a file with VIM with the option "set fenc=cp949".
> Then I loaded the file with EMACS. It is broken in my UTF-8 environment.
> So I was typing "M-x revert-buffer-with-coding-system", "cp949", then
> it looks well.

> After applying this patch, I ran a same test.
> It shows a wrong letter.

> I guess the problem is related to coding system.

> The problem that "HANGUL SYLLABLE HAEH" is shown "HANGUL SYLLABLE
> JWIG" occured in EUC-KR coding system.
> Because "HANGUL SYLLABLE HAEH" is not contained in EUC-KR region.
> But CP949 contains "HANGUL SYLLABLE HAEH".
> This patch could not fix it.

Wiebe's patch doesn't contain the change of cp949
"coding-system".  So, what you did is the same as reading a
cp949 file by euc-kr coding-system.

By the way, decoding "\xc1\x64" as U+C951 is a bug of
decoding routine for EUC-type coding systems.  It should
treat that sequence as invalid as Emacs 22 does.  I've just
installed a fix.

With the following additional patch, you should be able to
read a cp949 file by cp949 coding-system correctly.

Index: korean.el
RCS file: /cvsroot/emacs/emacs/lisp/language/korean.el,v
retrieving revision 1.41
retrieving revision 1.42
diff -u -r1.41 -r1.42
--- korean.el   5 Jan 2009 03:22:27 -0000       1.41
+++ korean.el   18 Jun 2009 01:15:32 -0000      1.42
@@ -43,7 +43,6 @@
 (define-coding-system-alias 'euc-kr 'korean-iso-8bit)
 (define-coding-system-alias 'euc-korea 'korean-iso-8bit)
-(define-coding-system-alias 'cp949 'korean-iso-8bit)
 (define-coding-system 'iso-2022-kr
   "ISO 2022 based 7-bit encoding for Korean KSC5601 (MIME:ISO-2022-KR)."
@@ -58,6 +57,14 @@
 (define-coding-system-alias 'korean-iso-7bit-lock 'iso-2022-kr)
+(define-coding-system 'korean-cp949
+  "CP949 (Microsoft Unified Hangul Code)"
+  :coding-type 'charset
+  :mnemonic ?K
+  :charset-list '(ascii cp949))
+(define-coding-system-alias 'cp949 'korean-cp949)
  "Korean" '((setup-function . setup-korean-environment-internal)
            (exit-function . exit-korean-environment)

Yidong and Stefan, I have not yet installed this change
because it is a fix for the bug existing in Emacs 22.  But,
the change itself is very safe and the effect is that Emacs
can correctly decode all CP949 files some of which were not
decodable previously.  In addition, it seems that CP949 is
very important for Korean Windows users.  Shall I install it

Kenichi Handa

