bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#11860: 24.1; Arabic - Harakat (diacritics, short vowels) don't appea


From: Jason Rumney
Subject: bug#11860: 24.1; Arabic - Harakat (diacritics, short vowels) don't appear
Date: Sun, 19 Aug 2012 11:02:52 +0800
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.1.50 (gnu/linux)

Kenichi Handa <handa@gnu.org> writes:

> In article <83txw0aczg.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:
>
>> > From: Kenichi Handa <handa@gnu.org>
>> > Cc: eliz@gnu.org, 11860@debbugs.gnu.org, smias@yandex.ru
>> > Date: Sat, 18 Aug 2012 11:45:27 +0900
>> > 
>> > So, apparently Emacs on Windows and GNU/Linux uses the
>> > different metrics of glyphs.

Right, but adding the offsets to the corresponding metrics, we get the
same result with both the Windows and GNU/Linux cases, except for the
total height of the font, which I think is because Windows counts
inter-line spacing, while on GNU/Linux, that is separate.

So I'm not sure that this is causing us problems (see Eli's report about
Hebrew), it's just a case of a different reference point being used
between Windows and GNU/Linux.

> For Hebrew too, on Windows, I see the same problem as what
> Steffan <smias@yandex.ru> reported:

If you are seeing something different than Eli for Hebrew with the same
font, then I suspect the cause is linked with the version of Uniscribe
that is installed. Maybe diacritic handling for Hebrew and Arabic is a
more recent addition to Uniscribe than the basic support for those
languages.

>> > For instance, in the above case, we may have to render glyphs in
>> > this order (diacritical mark first):
>> > 
>> >   [0 1 1593 760 0 3 6 12 4 [1 -2 0]]
>> >   [0 1 1593 969 8 1 8 12 4 nil]

I'm curious as to how we ended up with the same C entry in those
vectors.  Could this be causing us problems later on?  The glyph index
is correct (comparing to the GNU/Linux version), but I wonder if
Uniscribe is referring back to the character at some point and tripping
up because it has been changed.

> I've just read the function uniscribe_shape in
> w32uniscribe.c.  It seems that these are the key API for
> uniscribe:
>
> * ScriptItemize -- no idea what is this

This should be a no-op on Emacs, as we already split the string into
LGSTRING components. But if it is not called, subsequent uniscribe
operations fail, so it must also be doing some initialization of
internal structures as well.

> * ScriptShape -- perhaps for glyph substitution (GSUB features of opentype)
> * ScriptPlace -- perhaps for glyph positioning (GPOS features of opentype)

Yes, I think that is correct.

> So at first please check the documentation of ScriptShape
> and figure out how it works for bidi script; i.e. what order
> does it expect for input, and what order does it produce.
>
> Next please find the meaning of this code fragment:
>
>                 /* Detect clusters, for linking codes back to
>                    characters.  */
>                 if (attributes[j].fClusterStart)
>                   {
>                     while (from < nchars_in_run && clusters[from] < j)
>                       from++;
>                     if (from >= nchars_in_run)
>                       from = to = nchars_in_run - 1;
>                     else
>                       {
>                         int k;
>                         to = nchars_in_run - 1;
>                         for (k = from + 1; k < nchars_in_run; k++)
>                           {
>                             if (clusters[k] > j)
>                               {
>                                 to = k - 1;
>                                 break;
>                               }
>                           }
>                       }
>                   }
>
> The comment refer to "clusters".  I don't know what it
> exactly means in uniscribe, but I guess it relates to
> grapheme cluster, and if so, this part seems to relates to
> the ordering of glyphs in this kind of grapheme clauster:
>
>   [0 1 1593 969 8 1 8 12 4 nil]
>   [0 1 1593 760 0 3 6 12 4 [1 -2 0]]

That seems to be correct.  Maybe this is the code that is changing the
character code to 1593.  I seem to recall that something like this was
required for Indic languages to let Emacs know which characters had been
linked back into one glyph.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]