emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: X11 Compound Text vs ISO 2022


From: Kenichi Handa
Subject: Re: X11 Compound Text vs ISO 2022
Date: Thu, 29 Jul 2010 21:36:31 +0900

Very sorry for the late response on this matter.

In article <address@hidden>, James Cloos <address@hidden> writes:

> While testing my recently applied patch, I've discovered that Emacs will
> product ISO-2022 output for COMPOUND_TEXT which other libs and apps --
> notably including libX11 -- cannot decode.

> As an example, (encode-coding-string "•" 'compound-text) ; U+2022 BULLET
> produces "^[$(address@hidden(B".  '$(O' is ISO-IR 228¹, JIS X 2013:2000.  But
> libX11 only knows about the $( charsets:  0, 1, A-D and G-M.

> A number of characters are output in '^[$-1'; such as:

> (encode-coding-string "ℜ" 'compound-text) ; U+211C BLACK-LETTER CAPITAL R
> "^[$-1\365\334^[-A"
> (encode-coding-string "ʻ" 'compound-text) ; U+02BB MODIFIER LETTER TURNED 
> COMMA
> "^[$-1\244\333^[-A"

> That is encoded in mule-unicode-0100-24ff, essentially unknown outside
> Emacs.

I admit that those behaviour is not good now.  When I at
first implemented ctext in Emacs, there wasn't UTF8_STRING
nor CTEXT_with_UTF8_extended_segment.  So, I added more
character sets to it for cut&paste between two running
Emacses.  As Emacs was the only application that supported
many character sets at that time, no one complained about
that behaviour of ctext.  The other applications anyway
couldn't handle those many characters.

> Other libs/apps prefer to use utf-8³ in compound_text for such chars.

> I understand *why* this happens, given that Emacs used to use 2022
> internally, but it confuses other X11 apps.

Actually the latest Emacs (Emacs 23 and the later) uses
unicode internally.

> I am not fully fluent in Emacs' internal charset conversion routines;
> is there an easy way to tell it to limit which 2022 charsets it will
> use when converting a string into a 2022 encoding?  A better way?

It's fairly easy to limit charsets of ctext.  But, I care
the backward compatibility.  As ctext is the only coding
system that is compatible with iso-8859-1 and can encode
many other character sets, there will be old users who still
uses it for file/process encodings.

And, anyway ctext is not used for selection, I'd rather just
document that ctext is not fully compatible X's
COMPOUND_TEXT spec, but is the extended vesion.

For WM_NAME, etc, yes, we should use ctext-with-extensions,
and as ctext-with-extensions is not intended to be used
directly by users, I think it won't cause actual problems
even if we change it so that more characters are encoded
using UTF8-extended-segment.  So, I'll work on it soon.

The only problem with ctext-with-extensions is that it is
now implemented by Elisp, and thus it may cause GC.  I'm not
sure it is safe to call Lisp at the place we convert WM_NAME
etc.  If it is not safe, I'll implement
ctext-with-extensions in C.

---
Kenichi Handa
address@hidden



reply via email to

[Prev in Thread] Current Thread [Next in Thread]