[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: multilingual text in frame

From: Kenichi Handa
Subject: Re: multilingual text in frame
Date: Tue, 21 Jan 2003 15:19:38 +0900 (JST)
User-agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.2.92 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI)

In article <address@hidden>, Jason Rumney <address@hidden> writes:
> Jason Rumney <address@hidden> writes:
>>  It seems from the documentation I can find about XSetWMName (), that
>>  compound-text is the correct choice for encoding.

> On closer reading, maybe not. I had assumed that "Host Portable
> Character Encoding" meant compound-text, but apparently the X11R5
> spec defines it as "the same for all locales on a machine" but
> otherwise leaves it unspecified. So UTF-8 and compound-text would both
> be valid choices. 

No.  Please see the attached document extracted from "Xlib -
C Language X Interface".  It's in the source of X,
.../xc/doc/specs/X11/.  Perhaps the term "Host Portable
Character Encoding" was intrduced to solve the problem of

> Since most X functions (including XSetWMName ())
> state in their documentation that "the result is undefined if the
> string is not in the Host Portable Character Encoding", it would seem
> to be valid for a UTF-8 based X server to not recognize compound-text
> encoding properly.

It's not an X server but a window manager that should recognize them
after executing XGetWMName.  Anyway, yes, it's valid for a window
manager to ignore compound-text or even utf-8.

But, a correctly internationalized window manager will do
something like this:

  XGetWMName (display, w, &text_prop);
  XmbTextPropertyToTextList(dpy, &text_prop, &list, &num);
  XmbDrawString (display, title_drawable, font_set, gc, x, y, 
                 list[0], strlen  (list[0]))

Of course, because of the fate of internationalization, the
window manager must run in a correct locale, and what it can
display is only the characters supported in that locale.
What we really need is multilingulization.

> But I can't find any way to find out what the Host Portable Character
> Encoding is on a given system. Perhaps Handa-san knows more about the
> I18N features of X we could use to convert the string to something
> the X server is sure to recognize.

As far as I know, there's now way to know which encoding the
window manager can recognize.  So, all we can do is to
expect that the window manager is correctly
internationalized.  But, any window manager will at least
recognize XA_STRING.  So, x_set_name (in xfns.c) does this:

        text.encoding = (stringp ? XA_STRING
                         : FRAME_X_DISPLAY_INFO (f)->Xatom_COMPOUND_TEXT);

Ken'ichi HANDA

1.7.  Character Sets and Encodings

Some of the Xlib functions make reference to specific char-
acter sets and character encodings.  The following are the
most common:

o    X Portable Character Set

     A basic set of 97 characters, which are assumed to
     exist in all locales supported by Xlib.  This set con-
     tains the following characters:

a..z A..Z 0..9 !"#$%&'()*+,-./:;<=>address@hidden|}~ <space>,
<tab>, and <newline>

     This set is the left/lower half of the graphic charac-
     ter set of ISO8859-1 plus space, tab, and newline.  It
     is also the set of graphic characters in 7-bit ASCII
     plus the same three control characters.  The actual
     encoding of these characters on the host is system

o    Host Portable Character Encoding

     The encoding of the X Portable Character Set on the
     host.  The encoding itself is not defined by this stan-
     dard, but the encoding must be the same in all locales
     supported by Xlib on the host.  If a string is said to
     be in the Host Portable Character Encoding, then it
     only contains characters from the X Portable Character
     Set, in the host encoding.

o    Latin-1

     The coded character set defined by the ISO 8859-1 stan-

o    Latin Portable Character Encoding

     The encoding of the X Portable Character Set using the
     Latin-1 codepoints plus ASCII control characters.  If a
     string is said to be in the Latin Portable Character
     Encoding, then it only contains characters from the X
     Portable Character Set, not all of Latin-1.

o    STRING Encoding

     Latin-1, plus tab and newline.

o    UTF-8 Encoding

     The ASCII compatible character encoding scheme defined
     by the ISO 10646-1 standard.

o    POSIX Portable Filename Character Set

     The set of 65 characters, which can be used in naming
     files on a POSIX-compliant host, that are correctly
     processed in all locales.  The set is:

     a..z A..Z 0..9 ._-

reply via email to

[Prev in Thread] Current Thread [Next in Thread]