[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati with Harfbu
bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati with Harfbuzz enabled (renders fine using m17n)
Sun, 06 Jan 2019 18:03:55 +0200
> Date: Sat, 5 Jan 2019 23:15:14 +0200
> From: Khaled Hosny <address@hidden>
> Cc: address@hidden, address@hidden, address@hidden,
> address@hidden, address@hidden
> > > The built-in HarfBuzz code is for getting the script for a given
> > > character, but resolving characters with Common script is left to the
> > > client. Suppose you have this string (upper case for RTL) ABC 123 DEF,
> > > what HarfBuzz sees during shaping is three separate chunks of text ABC,
> > > 123, DEF. The 123 part is all Common script characters and thus
> > > hb_buffer_guess_segment_properties won’t be able to guess anything (and
> > > based on the font and the script, this can cause rendering differences).
> > > Emacs will have to resolve the script of Common characters before
> > > applying bidi algorithm and pass that down to HarfBuzz.
> > I'm not sure I understand: why does HarfBuzz care that 123 was in the
> > middle if RTL text.
> It doesn’t. What it cares about here is the correct script. Because 123
> are in the middle of RTL text they will be shaped separately, and thus
> hb_buffer_guess_segment_properties() will only see 123 and won’t to be
> able to guess the correct script for them (Arabic, Hebrew, etc.,
> whatever the script for the surrounding RTL text is).
That's what I was asking: why it's important for HarfBuzz to know that
123 should be shaped for the Arabic script?
> Depending on the font, the digits might be shaped differently if the
> script is, say Arabic, by e.g. applying script-specific substitutions to
> forms more suitable for a given script.
I guess this is what I'm missing, then: these script-specific
substitutions. Can you elaborate on that, or point to some place
where these substitutions are described in detail?
> > (In general, AFAIK simple characters like 123 will not even go through
> > HarfBuzz, as Emacs doesn't call the shaper for characters whose entry
> > in composition-function-table is nil. So I guess 123 here should
> > stand for some other characters, not for literal digits? IOW, I don't
> > think I understand the example very well.)
> This is a bug then and needs to be fixed. All text should go through
> HarfBuzz since even so-called “simple” character often require shaping
> depending on the text and the font. If this is done for optimization,
> then it should be revised to see if shaping with HarfBuzz is actually
> significantly slower and if it is, find more proper ways to optimize it.
(Adding Handa-san to the discussion, in the hope that he could comment
on the issue.)
I think running all text through a shaper might be prohibitively
expensive, because the shaper is called through Lisp code (see
composite.el), and we decide which chunk of text to pass to the shaper
using regexp search. See the various files under lisp/language/ which
set up portions of composition-function-table as appropriate for each
language that needs it.
So I think we should identify all the cases where "simple" characters
surrounded by, or adjacent to, "non-simple" ones need to be passed to
a shaper, and add the necessary regular expressions to the data
structures in lisp/languages/. Can you describe these cases, or point
me to a place where I can find the relevant info?