[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: address@hidden: BUG: Emacs ignores charcell width when running on te

From: Kenichi Handa
Subject: Re: address@hidden: BUG: Emacs ignores charcell width when running on terminal (w/rtfs & ideas for fix)]
Date: Tue, 24 Oct 2006 09:30:38 +0900
User-agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/22.0.50 (i686-pc-linux-gnu) MULE/5.0 (SAKAKI)

In article <address@hidden>, Richard Stallman <address@hidden> writes:

> Would you please look at this issue and comment?
> I am not sure if this is something we should try to fix, now or ever.
> But I would like you to think about it.

Sorry for the late response.   Actually there's not that
much we can do on this matter.

> ------- Start of forwarded message -------
> Date: Wed, 11 Oct 2006 15:16:50 -0400
> To: address@hidden
> From: Rich Felker <address@hidden>
> Subject: BUG: Emacs ignores charcell width when running on terminal (w/rtfs
>       & ideas for fix)
> When GNU Emacs is run on a terminal (-nw mode) and editing UTF-8 text
> files, it treats all characters as if they occupy one character cell
> column on the terminal. This causes it to become confused about the
> cursor position whenever there is CJK fullwidth text or scripts that
> use nonspacing combining characters present, to the point that editing
> is impossible.

Unfortunately, the current Emacs assumes that all characters
in a charset has the same width.  As far as we are dealing
with legacy charsets (e.g. ISO8859, JISX, KSC, GB), that
assumption worked well.

> Attached to this email is a UTF-8 file you can open in Emacs which
> exhibits the problem: Japanese Hiragana (for CJK wide) and Tibetan and
> Thai (for nonspacing).

> The root of the problem: In term.c, produce_glyphs() function, the
> code assumes all multibyte characters for a given 'charset' have the
> same width:

The root of the problem is that there's no way for Emacs to
know how many column a terminal use to display a specific
character.  For Hiragana, it's possible for Emacs to guess
it will be displayed with two-column, but for Tibetan and
Thai, it heavily depends on terminal's capapbility of
handling CTL (Complex Text Layout).  If a terminal doesn't
know how to do CTL for Tibetan, it will just produce glyphs
for each syllable component without stacking (and thus
occupy several columns).  If a terminal does, it will dislay
them in one (or two) column.  But, there's no way for Emacs
to know which is the case.

> Correctly fixing the issue:

> 1. Needs some sort of width lookup for unicode characters without
>    having to convert from Emacs' native encoding to UCS thru UTF-8.
>    This should be straightforward for someone who understands the
>    code.

That only works for such simple characters as Hiranaga.  In
emacs-unicode-2 branch, I introduced char-width-table that
maps each character to column-width occupied by that
character on screen.

> 2. The apppend_glyph() function needs to handle width==0 case, perhaps
>    converting the previous glyph into a COMPOSITE_GLYPH instead of
>    adding a CHAR_GLYPH. However I don't understand the COMPOSITE_GLYPH
>    system in Emacs so I don't know if this is feasible.

COMPOSITE_GLYPH is a glyph containing multiple characters
that must be displayed as a single grapheme cluster.  On X,
Emacs displays characters in a COMPOSITE_GLYPH correctly
(sometimes by stacking, sometimes by overstriking, sometimes
by using alternate glyph, etc).  But, as there's no way on
terminal to perform such a operation, current Emacs just
displays the first character of a COMPOSITE_GLYPH.

> At present this issue is making it very difficult for me to use
> Tibetan text in composing email and material for the web, so I'm
> looking for some way to fix it, either upstream or with hacks I can
> make locally for the time being until it's fixed properly.

If you want to handle Tibetan text, using X is the only way
for the moment.

Kenichi Handa

reply via email to

[Prev in Thread] Current Thread [Next in Thread]