[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati with Harfbu
bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati with Harfbuzz enabled (renders fine using m17n)
Sun, 27 Jan 2019 19:12:04 +0200
Could you please respond to the below as well?
> Date: Sun, 06 Jan 2019 18:03:55 +0200
> From: Eli Zaretskii <address@hidden>
> Cc: address@hidden, address@hidden, address@hidden
> > Date: Sat, 5 Jan 2019 23:15:14 +0200
> > From: Khaled Hosny <address@hidden>
> > Cc: address@hidden, address@hidden, address@hidden,
> > address@hidden, address@hidden
> > > > The built-in HarfBuzz code is for getting the script for a given
> > > > character, but resolving characters with Common script is left to the
> > > > client. Suppose you have this string (upper case for RTL) ABC 123 DEF,
> > > > what HarfBuzz sees during shaping is three separate chunks of text ABC,
> > > > 123, DEF. The 123 part is all Common script characters and thus
> > > > hb_buffer_guess_segment_properties won’t be able to guess anything (and
> > > > based on the font and the script, this can cause rendering differences).
> > > > Emacs will have to resolve the script of Common characters before
> > > > applying bidi algorithm and pass that down to HarfBuzz.
> > >
> > > I'm not sure I understand: why does HarfBuzz care that 123 was in the
> > > middle if RTL text.
> > It doesn’t. What it cares about here is the correct script. Because 123
> > are in the middle of RTL text they will be shaped separately, and thus
> > hb_buffer_guess_segment_properties() will only see 123 and won’t to be
> > able to guess the correct script for them (Arabic, Hebrew, etc.,
> > whatever the script for the surrounding RTL text is).
> That's what I was asking: why it's important for HarfBuzz to know that
> 123 should be shaped for the Arabic script?
> > Depending on the font, the digits might be shaped differently if the
> > script is, say Arabic, by e.g. applying script-specific substitutions to
> > forms more suitable for a given script.
> I guess this is what I'm missing, then: these script-specific
> substitutions. Can you elaborate on that, or point to some place
> where these substitutions are described in detail?
> > > (In general, AFAIK simple characters like 123 will not even go through
> > > HarfBuzz, as Emacs doesn't call the shaper for characters whose entry
> > > in composition-function-table is nil. So I guess 123 here should
> > > stand for some other characters, not for literal digits? IOW, I don't
> > > think I understand the example very well.)
> > This is a bug then and needs to be fixed. All text should go through
> > HarfBuzz since even so-called “simple” character often require shaping
> > depending on the text and the font. If this is done for optimization,
> > then it should be revised to see if shaping with HarfBuzz is actually
> > significantly slower and if it is, find more proper ways to optimize it.
> (Adding Handa-san to the discussion, in the hope that he could comment
> on the issue.)
> I think running all text through a shaper might be prohibitively
> expensive, because the shaper is called through Lisp code (see
> composite.el), and we decide which chunk of text to pass to the shaper
> using regexp search. See the various files under lisp/language/ which
> set up portions of composition-function-table as appropriate for each
> language that needs it.
> So I think we should identify all the cases where "simple" characters
> surrounded by, or adjacent to, "non-simple" ones need to be passed to
> a shaper, and add the necessary regular expressions to the data
> structures in lisp/languages/. Can you describe these cases, or point
> me to a place where I can find the relevant info?