emacs-bidi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[emacs-bidi] Re: Arabic support


From: Kenichi Handa
Subject: [emacs-bidi] Re: Arabic support
Date: Fri, 03 Sep 2010 10:00:02 +0900

In article <address@hidden>, Eli Zaretskii <address@hidden> writes:

> > A not-yet-shaped LGSTRING is created by autocmp_chars
> > (composite.c) from a character sequence matching with a
> > regular expression PATTERN stored in a
> > composition-function-table.  This pattern is
> > "[\u0600-\u06FF]+" for Arabic (lisp/language/misc-lang.el),
> > and a more complicated regex for Hebrew
> > (lisp/language/hebrew.el).

> Thanks.  So character compositions are used not only to compose
> several characters into one glyph, but also to break text into
> individually shaped chunks, is that right?

Yes.

> If so, auto-composition-mode cannot be turned off for scripts that
> need this kind of "grouped shaping" without degrading the presentation
> of these scripts to the point of illegibility?

Yes.  And auto-composition-mode cannot be turned off for any
scripts that it is not enough to display glyphs
corresponding to characters; they are all Indics, some East
Asians, Arabic, Hebrew, etc.  In this respect, Ababic is not
special.  Even for some Indics, LGSTRING may contain
multibyte grapheme clusters.

> > > I'm asking because it's possible that we will need to modify
> > > w32uniscribe.c to reorder R2L characters before we pass them to the
> > > Uniscribe ScriptShape API, to let it see the characters in the logical
> > > order it expects them.  That's if it turns out that Uniscribe cannot
> > > otherwise shape them correctly.
> > 
> > ??? Currently characters and glyphs in LGSTRING are always
> > in logical order.

> See my mail from yesterday, where I describe that I see in GDB that
> Arabic characters in LGSTRINGs arrive to uniscribe_shape in visual
> order:

>   http://lists.gnu.org/archive/html/emacs-devel/2010-09/msg00029.html

In this mail, you wrote:

> Also, it looks like uniscribe_shape is repeatedly called from
> font-shape-gstring to shape the same text that is progressively
> shortened.  For example, the first call will be with a 7-character
> string whose contents is

>    {0x627, 0x644, 0x633, 0x651, 0x644, 0x627, 0x645}

and this character sequence is surely in logical order.  So
I don't know why you think uniscribe_shape is given a
LGSTRING of visual order.

> The next call is with a 6-character string whose contents is

>    {0x627, 0x644, 0x633, 0x651, 0x644, 0x627}

> then a 5-character string {0x627, 0x644, 0x633, 0x651, 0x644}, etc.

> Note that the first 7-character string is the first word of the Arabic
> greeting, properly bidi-reordered for display.

> Are these series of calls expected?

No.  I don't know why that happens on Windows.  On Ubuntu,
when I visit a file that contains only these lines:
------------------------------------------------------------
Arabic السّلام
;;; Local Variables:
;;; bidi-display-reordering: t
;;; End:
------------------------------------------------------------
font-shape-gstring is called just once.

As the lgstring is getting shorter each time, it seems that
composition fails each time.

autocmp_chars is mainly called from composition_reseat_it.
Could you please trace the code after the first call of
autocmp_chars, and find why Emacs descides that a
composition fails.

---
Kenichi Handa
address@hidden



reply via email to

[Prev in Thread] Current Thread [Next in Thread]