[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[emacs-bidi] Re: Arabic support

From: Eli Zaretskii
Subject: [emacs-bidi] Re: Arabic support
Date: Fri, 03 Sep 2010 16:25:49 +0300

> From: Kenichi Handa <address@hidden>
> Cc: address@hidden, address@hidden, address@hidden
> Date: Fri, 03 Sep 2010 10:00:02 +0900
> > > > I'm asking because it's possible that we will need to modify
> > > > w32uniscribe.c to reorder R2L characters before we pass them to the
> > > > Uniscribe ScriptShape API, to let it see the characters in the logical
> > > > order it expects them.  That's if it turns out that Uniscribe cannot
> > > > otherwise shape them correctly.
> > > 
> > > ??? Currently characters and glyphs in LGSTRING are always
> > > in logical order.
> > See my mail from yesterday, where I describe that I see in GDB that
> > Arabic characters in LGSTRINGs arrive to uniscribe_shape in visual
> > order:
> >   http://lists.gnu.org/archive/html/emacs-devel/2010-09/msg00029.html
> In this mail, you wrote:
> > Also, it looks like uniscribe_shape is repeatedly called from
> > font-shape-gstring to shape the same text that is progressively
> > shortened.  For example, the first call will be with a 7-character
> > string whose contents is
> >    {0x627, 0x644, 0x633, 0x651, 0x644, 0x627, 0x645}
> and this character sequence is surely in logical order.  So
> I don't know why you think uniscribe_shape is given a
> LGSTRING of visual order.

Sorry, you are right.  I got fooled by the fact that the end of the
string is almost a mirror image of its beginning.

There's something I'm missing in how character compositions and font
shaping work together with bidi reordering.  I need to understand that
to figure out what, if anything, needs to be fixed in uniscribe_shape
to get it to work correctly.

So let me describe how the bidi reordering works and my understanding
of how it interacts with character compositions, and ask you to
correct any inaccuracies and fill in the blanks.  Thanks in advance.

There are two use-cases that bidi reordering supports.  The first one
is reordering in left-to-right paragraphs, containing mostly L2R text
with embedded R2L characters.  I will call this "the L2R paragraph"

The other use-case is reordering in right-to-left paragraphs, which
typically almost entirely consist of R2L characters with embedded L2R
letters, digits, and other characters that are displayed left to
right.  I call this "the R2L paragraph" case.

For L2R paragraphs, runs of R2L characters are delivered in reverse
order (ignoring for the moment complications caused by directional
override control characters).  When the bidi iterator bumps into an
R2L character, it scans forward until the end of the run, then begins
to go back delivering the characters, thus reversing them on display.
When the run of R2L characters is exhausted, the iterator jumps to the
end of the run and resumes its normal forward scan.

For R2L paragraphs, runs of R2L characters are delivered in their
buffer's logical order, without reversing them.  L2R characters in
such paragraphs _are_ reversed, by the same process of scanning
forward past them, then delivering them back to front.  This produces
a mirror image of the line as it should be displayed, wherein the
character to be displayed the rightmost is the first glyph we produce.
To mirror the line into its correct order, the PRODUCE_GLYPHS macro,
which calls the produce_glyphs method of the terminal-specific
redisplay interface, _prepends_ each new glyph to those already
produced for the glyph row, rather than appending them in the L2R
paragraph case.  To illustrate, if we have a buffer with the following
contents (capital letters represent R2L characters):

  ABCD foo

then the bidi iterator will produce the characters in this order:

  ABCD oof

and then PRODUCE_GLYPHS will mirror them into

  foo DCBA

which is the correct visual order.

Note that in both cases, the glyph row generated by the above
procedure is drawn from left to right by the terminal-specific method
that delivers glyphs to the glass.  That method draws glyphs one by
one in the order they are stored in the glyph row.  No reordering
happens on this level, and in fact this level is totally ignorant
about the text directionality.

Enter character compositions.

During the buffer scan that delivers characters to PRODUCE_GLYPHS, if
the next character to be delivered is a composed character, then
composition_reseat_it and next_element_from_composition are called.
If they succeed to compose the character with one or more following
characters, the whole sequence of characters that where composed is
recorded in the glyph row as a single element of type IT_COMPOSITION.
This single element is expanded into the actual font glyphs when the
glyph row is drawn by the terminal-specific draw_glyphs method.  The
bidi reordering treats this single element as if it were a single
glyph, and thus does not reorder its font glyphs.  So this single
element winds up in the glyph row in the position corresponding to the
first character of the composed sequence.

The question is: in what order should the font glyphs be held in the
LGSTRING returned by the font driver's `shape' method?  Let's take an
example.  Suppose we have a L2R paragraph in a buffer with this

 foobar ABCDE

and suppose that "ABCDE" will be shaped by the font driver's `shape'
method into a logical-order sequence of glyphs "XYZ".  Since this is a
L2R paragraph, and since no reordering will happen to "XYZ" when it is
delivered to the glass, it must be stored in the LGSTRING in the
visual order, i.e. "ZYX", with X being the first character to be read
and the rightmost to display, Y the second, etc.

Now suppose we have a R2L paragraph:

 ABCDE foobar

The mirroring of the glyph row in PRODUCE_GLYPHS will now produce

 foobar XYZ

because it treats "XYZ" as a single element.  Again, no reordering
will happen to "XYZ" when it is drawn on the terminal.  So again, we
need "XYZ" to be stored in visual order, i.e. "ZYX".

You say that the contents of LGSTRING passed to the `shape' method are
in logical order.  The conclusion from the above seems to be that we
need to have the `shape' method reorder the shaped glyphs into visual
order.  Is that what happens with the libotf driver? does it indeed
reorder R2L glyphs it returns after reshaping?  If not, how does a
reshaped sequence of glyphs winds up correctly on display?

Even if everything I said above is correct, there are complications.
ABCDE could be inside an embedding with left to right override, like


This should be displayed as

 foobar ABCDE

i.e., "ABCDE" is not reordered, but displayed in the logical order, as
forced by RLO.  Therefore, the reshaped "XYZ" should also be displayed
left to right:

 foobar XYZ

But, if I understand correctly how composition works, the
auto-composed sequence in this case will still be just "XYZ", without
the RLO and PDF control characters.  So the `shape' method of the font
driver will still see just "XYZ" in the LGSTRING, without the control
characters, and will reorder "XYZ", which is incorrect.

If we need the `shape' method to reorder glyphs, then in order for it
do its job correctly, we need to give it the entire bidi context of
the string we are asking it to reshape.  In the above example, we need
to tell it about the override directive, i.e. pass it "ABCDE" with
surrounding RLO and PDF controls.  This flies in the face of the
current design, which separates reordering from glyph shaping.

So the conclusion is that we need the `shape' method to return the
reshaped glyphs in the logical order, and then reorder them
afterwards.  If this is correct, we need to make 2 changes:

  . change the interface to the `shape' method, so that the reshaped
    LGSTRING holds glyphs in the logical order

  . modify fill_gstring_glyph_string to reorder glyphs when it puts
    them into a glyph_string structure

Am I missing something?

reply via email to

[Prev in Thread] Current Thread [Next in Thread]