bug#11860: 24.1; Arabic - Harakat (diacritics, short vowels) don't appea

bug-gnu-emacs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#11860: 24.1; Arabic - Harakat (diacritics, short vowels) don't appea

From:	Eli Zaretskii
Subject:	bug#11860: 24.1; Arabic - Harakat (diacritics, short vowels) don't appear
Date:	Sun, 19 Aug 2012 20:56:57 +0300

> From: Jason Rumney <jasonr@gnu.org>
> Cc: Eli Zaretskii <eliz@gnu.org>,  11860@debbugs.gnu.org,  smias@yandex.ru
> Date: Sun, 19 Aug 2012 11:02:52 +0800
> 
> Kenichi Handa <handa@gnu.org> writes:
> 
> > In article <83txw0aczg.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:
> >
> >> > From: Kenichi Handa <handa@gnu.org>
> >> > Cc: eliz@gnu.org, 11860@debbugs.gnu.org, smias@yandex.ru
> >> > Date: Sat, 18 Aug 2012 11:45:27 +0900
> >> > 
> >> > So, apparently Emacs on Windows and GNU/Linux uses the
> >> > different metrics of glyphs.
> 
> Right, but adding the offsets to the corresponding metrics, we get the
> same result with both the Windows and GNU/Linux cases

I think the results of addition are not relevant to the problem.  The
problem is that the diacriticals and/or vowels are not drawn at
correct horizontal positions.  The values of the offsets are directly
relevant to that, because they describe how many pixels to advance
after drawing each glyph.  By contrast, the sum of the offsets will be
always approximately the same, since the entire grapheme cluster
occupies a single character cell.

> So I'm not sure that this is causing us problems (see Eli's report about
> Hebrew), it's just a case of a different reference point being used
> between Windows and GNU/Linux.

My report about Hebrew is not relevant either; see below.

> If you are seeing something different than Eli for Hebrew with the same
> font, then I suspect the cause is linked with the version of Uniscribe
> that is installed. Maybe diacritic handling for Hebrew and Arabic is a
> more recent addition to Uniscribe than the basic support for those
> languages.

That appears to be the case, indeed.  My initial attempts to reproduce
this were on XP SP3, where Hebrew rendering appeared to be OK.  I now
tried on Windows 7 and there I see the problem with Hebrew as well.

Moreover, when I type the Hebrew characters specified by the OP, I
don't see that the uniscribe_shape function is called at all on XP: a
breakpoint inside it never breaks.  On Windows 7, that function does
get called.

Jason, how can I find out whether Uniscribe is used for rendering
Hebrew, or why doesn't Emacs call uniscribe_shape?  (I know about
uniscribe_font->cache, but I don't see that function called even if I
start Emacs with a breakpoint in it, so it seems the cache is not the
issue here.  The cache is per application, right?)

For Arabic characters in the recipe, uniscribe_shape _is_ called on
XP.  I guess that's why the problem with Arabic is visible on both XP
and Windows7.

For the record, here's the output of "C-u C-x =" on XP for the Hebrew
character composition mentioned earlier:

               position: 193 of 194 (99%), column: 1
              character: ג‎ (displayed as ג‎) (codepoint 1490, #o2722, #x5d2)
      preferred charset: iso-8859-8 (ISO/IEC 8859/8)
  code point in charset: 0xE2
                 syntax: w      which means: word
               category: .:Base, R:Right-to-left (strong)
               to input: type "d" with hebrew-full
            buffer code: #xD7 #x92
              file code: #xE2 (encoded by coding system hebrew-iso-8bit-dos)
                display: composed to form "גֻ" (see below)

  Composed with the following character(s) "ֻ" using this font:
    uniscribe:-outline-Courier 
New-normal-normal-normal-mono-13-*-*-*-c-*-iso8859-8
  by these glyphs:
    [0 1 1490 674 8 0 6 12 4 nil]
    [0 1 1467 663 8 0 7 12 4 [-8 0 0]]

Compare with the output on Windows 7 to see the differences:

               position: 193 of 194 (99%), column: 1
              character: ג‎ (displayed as ג‎) (codepoint 1490, #o2722, #x5d2)
      preferred charset: unicode (Unicode (ISO10646))
  code point in charset: 0x05D2
                 syntax: w      which means: word
               category: .:Base, R:Right-to-left (strong)
               to input: type "C-x 8 RET HEX-CODEPOINT" or "C-x 8 RET NAME"
            buffer code: #xD7 #x92
              file code: not encodable by coding system iso-latin-1-dos
                display: composed to form "גֻ" (see below)

  Composed with the following character(s) "ֻ" using this font:
    uniscribe:-outline-Courier 
New-normal-normal-normal-mono-13-*-*-*-c-*-iso10646-1
  by these glyphs:
    [0 1 1490 674 8 1 6 12 4 nil]
    [0 1 1490 663 0 2 6 12 4 nil]

And here's the output of "C-u C-x =" for the Arabic character Ayin
with sukun on XP:

               position: 197 of 198 (99%), column: 0
              character: ع‎ (displayed as ع‎) (codepoint 1593, #o3071, #x639)
      preferred charset: unicode (Unicode (ISO10646))
  code point in charset: 0x0639
                 syntax: w      which means: word
               category: .:Base, R:Right-to-left (strong), b:Arabic
            buffer code: #xD8 #xB9
              file code: not encodable by coding system hebrew-iso-8bit-dos
                display: composed to form "عْ" (see below)

  Composed with the following character(s) "ْ" using this font:
    uniscribe:-outline-Courier 
New-normal-normal-normal-mono-13-*-*-*-c-*-iso10646-1
  by these glyphs:
    [0 1 1593 969 8 2 8 12 4 nil]
    [0 1 1593 1028 0 3 6 12 4 nil]

Note that the glyph index of the sukun are different from the Windows
7 output.  I have no idea why.

> >> >   [0 1 1593 760 0 3 6 12 4 [1 -2 0]]
> >> >   [0 1 1593 969 8 1 8 12 4 nil]
> 
> I'm curious as to how we ended up with the same C entry in those
> vectors.

That's because the code in uniscribe_shape does this:

                  LGLYPH_SET_CHAR (lglyph, chars[items[i].iCharPos
                                                 + from]);

and it does that for all the 'nglyphs' glyphs produced by ScriptPlace.

As Handa-san writes, the character code is never used, because we have
the font glyph index and its metrics, so I think this is a non-issue.

> Could this be causing us problems later on?  The glyph index
> is correct (comparing to the GNU/Linux version), but I wonder if
> Uniscribe is referring back to the character at some point and tripping
> up because it has been changed.

Uniscribe cannot refer to this code, because Uniscribe doesn't use
LGSTRING, IIUC.  Or does it?  (If it does, please show where in the
code it uses that value.)

> >               /* Detect clusters, for linking codes back to
> >                  characters.  */
> >               if (attributes[j].fClusterStart)
> >                 {
> >                   while (from < nchars_in_run && clusters[from] < j)
> >                     from++;
> >                   if (from >= nchars_in_run)
> >                     from = to = nchars_in_run - 1;
> >                   else
> >                     {
> >                       int k;
> >                       to = nchars_in_run - 1;
> >                       for (k = from + 1; k < nchars_in_run; k++)
> >                         {
> >                           if (clusters[k] > j)
> >                             {
> >                               to = k - 1;
> >                               break;
> >                             }
> >                         }
> >                     }
> >                 }
> >
> > The comment refer to "clusters".  I don't know what it
> > exactly means in uniscribe, but I guess it relates to
> > grapheme cluster, and if so, this part seems to relates to
> > the ordering of glyphs in this kind of grapheme clauster:
> >
> >   [0 1 1593 969 8 1 8 12 4 nil]
> >   [0 1 1593 760 0 3 6 12 4 [1 -2 0]]
> 
> That seems to be correct.  Maybe this is the code that is changing the
> character code to 1593.

It doesn't _change_ the character code, it simply sets it to the code
of the base character.  But again, I don't think this is relevant.

[Prev in Thread]

Current Thread

[Next in Thread]

bug#11860: 24.1; Arabic - Harakat (diacritics, short vowels) don't appear, (continued)
- bug#11860: 24.1; Arabic - Harakat (diacritics, short vowels) don't appear, Steffan, 2012/08/22
  - bug#11860: 24.1; Arabic - Harakat (diacritics, short vowels) don't appear, Eli Zaretskii, 2012/08/22
- bug#11860: 24.1; Arabic - Harakat (diacritics, short vowels) don't appear, Steffan, 2012/08/22
  - bug#11860: 24.1; Arabic - Harakat (diacritics, short vowels) don't appear, Eli Zaretskii, 2012/08/22
- bug#11860: 24.1; Arabic - Harakat (diacritics, short vowels) don't appear, Steffan, 2012/08/27

Prev by Date: bug#12233: 24.1.50; Please index "sexp" in Elisp manual
Next by Date: bug#11860: 24.1; Arabic - Harakat (diacritics, short vowels) don't appear
Previous by thread: bug#11860: 24.1; Arabic - Harakat (diacritics, short vowels) don't appear
Next by thread: bug#11860: 24.1; Arabic - Harakat (diacritics, short vowels) don't appear
Index(es):
- Date
- Thread