[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati with Harfbu

From: Khaled Hosny
Subject: bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati with Harfbuzz enabled (renders fine using m17n)
Date: Sat, 5 Jan 2019 23:15:14 +0200
User-agent: Mutt/1.11.1 (2018-12-01)

On Mon, Dec 24, 2018 at 08:07:04PM +0200, Eli Zaretskii wrote:
> > Date: Mon, 24 Dec 2018 19:37:23 +0200
> > From: Khaled Hosny <address@hidden>
> > Cc: address@hidden, address@hidden, address@hidden,
> >     address@hidden, address@hidden
> > 
> > > Per previous discussions, we decided to use the Harfbuzz built-in
> > > methods for determining the script, since Emacs doesn't have this
> > > information, and adding it will just do the same as Harfbuzz does,
> > > i.e. find the first character whose script is not Common etc., using
> > > the UCD database.  I think it was you who suggested to use the
> > > Harfbuzz built-ins in this case.
> > 
> > The built-in HarfBuzz code is for getting the script for a given
> > character, but resolving characters with Common script is left to the
> > client. Suppose you have this string (upper case for RTL) ABC 123 DEF,
> > what HarfBuzz sees during shaping is three separate chunks of text ABC,
> > 123, DEF. The 123 part is all Common script characters and thus
> > hb_buffer_guess_segment_properties won’t be able to guess anything (and
> > based on the font and the script, this can cause rendering differences).
> > Emacs will have to resolve the script of Common characters before
> > applying bidi algorithm and pass that down to HarfBuzz.
> I'm not sure I understand: why does HarfBuzz care that 123 was in the
> middle if RTL text.

It doesn’t. What it cares about here is the correct script. Because 123
are in the middle of RTL text they will be shaped separately, and thus
hb_buffer_guess_segment_properties() will only see 123 and won’t to be
able to guess the correct script for them (Arabic, Hebrew, etc.,
whatever the script for the surrounding RTL text is).

The point I’m trying to make is that script detection, even in its
simplest form, needs to be done on the text as a whole not just the
portion being shaped, which makes hb_buffer_guess_segment_properties()
ill equipped for doing this as it only sees a small portion of the text
at a time.

> Does it need to shape 123 specially in this case?

Depending on the font, the digits might be shaped differently if the
script is, say Arabic, by e.g. applying script-specific substitutions to
forms more suitable for a given script.
> (In general, AFAIK simple characters like 123 will not even go through
> HarfBuzz, as Emacs doesn't call the shaper for characters whose entry
> in composition-function-table is nil.  So I guess 123 here should
> stand for some other characters, not for literal digits?  IOW, I don't
> think I understand the example very well.)

This is a bug then and needs to be fixed. All text should go through
HarfBuzz since even so-called “simple” character often require shaping
depending on the text and the font. If this is done for optimization,
then it should be revised to see if shaping with HarfBuzz is actually
significantly slower and if it is, find more proper ways to optimize it.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]