emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywh


From: Eli Zaretskii
Subject: Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
Date: Sat, 23 May 2020 17:08:20 +0300

> From: Pip Cet <address@hidden>
> Date: Sat, 23 May 2020 12:36:56 +0000
> Cc: address@hidden, address@hidden, address@hidden
> 
> > > You write: "(b) is not really feasible without redesigning the entire
> > > Emacs display engine". I don't see how that's true at all. All we need
> > > is some limited look-ahead.
> >
> > We already have look-ahead: that's what the regexp part of the
> > composition rules are about.  That is not the crucial problem.
> 
> But it's the only problem I see!

Then maybe I don't understand what you mean by look-ahead.  Is that
the decision how to choose those 32 characters of "context"?  Then why
not use the current regexp-based approach, which is already much
smarter than just blindly taking a fixed amount of surrounding text?

> When you see an IT_CHARACTER, you get some context, hand it to
> HarfBuzz, slice up the relevant glyphs, and display them.

The problem is, of course, in the "some context" part.  Your patch
used an arbitrary 32-character chunk of text around the character to
shape, which is of course not what the shaping engines want: they want
_all_ of the surrounding text, the entire paragraph.

Your patch also invokes the shaper twice, on the same 32 characters,
once in encode_char method and again in the text_extents method, which
is another waste.  The code in composite.c caches the composed
characters to avoid that, but you bypass it.

This is okay for showing the concept, but we cannot use this in
production.  There are too many arbitrary decisions and inefficient
expensive operations.

> It doesn't involve composite.c at all, and that's good, because for
> those tricky special cases composite.c does a better job than standard
> shaping, and we need to keep that feature. It just shouldn't be the
> regular route.

Of course, you never tell how to distinguish between the "tricky
special cases" for which we still need to use composite.c and friends,
and the other kind.

Moreover, the HarfBuzz guys clearly say that what we do now is wrong
for those "tricky" cases as well, so if we are going to fix that, why
fix it only for ligatures made out of ASCII characters?

> > The crucial problem is that we currently perform layout decisions one
> > grapheme cluster at a time, whereas what HarfBuzz people say is that
> > we should basically do that one screen line at a time.
> 
> I think we're going to have to compromise: that's why my patch used a
> 32-character context rather than an entire line or just a single
> character.

If we are going to compromise, then why not compromise on what we
already have, which is much less than 32 characters?  Why should we
enormously complicate and slow down our code without actually solving
the problem?  Did you ever see ligatures that are 32-character long?

> Ideally, of course, in most real cases we'd use whitespace-delimited
> words as chunks. That's mere optimization, though.

That'd be the wrong optimization, AFAIK.  E.g., some scripts don't
have whitespace separated words at all, and still need shaping.  And
what exactly is whitespace for this purpose? e.g., does it include
Unicode control characters such as ZWJ?

> > A secondary (but important) problem is that character composition
> > involves calls to Lisp, which is relatively slow.  This precludes
> > calling the shaper for too many characters at once, too many times for
> > each redisplay cycle of a window.
> 
> I agree we shouldn't go through Lisp. My patch didn't.

Your patch hard-codes arbitrary numbers without any way to control
that from Lisp.  Such code will never fly in Emacs.

> Calling the shaper less often is an important optimization, too. For
> whitespace-delimited words, we only need to call it once.

This doesn't work when the produced sequence of glyphs doesn't fit on
the screen line.  What the current layout code does in this case won't
work well when you need to break a long sequence of glyphs in the
middle and then continue on the next line from where you left off on
this one.  The longer the sequence of glyphs you get from the shaper
in one go, the higher the probability of hitting this issue.

The bottom line of this is that I think you will find very quickly
that the basic assumptions of the current design -- that we produce
single glyphs or very short sequences of them for each call to the
shaper -- that these assumptions bite you on every step, because the
code which deals with layout implicitly assumes this.

In short, I really don't see how this could ever work, except in a
very limited set of simple use cases.  E.g., what do you do with
bidirectional text? ignore it?

> > I don't think there's any disagreements on this high and abstract
> > level.
> 
> I think there are: if we treat fonts as programs, we need to let them
> do their job, which involves kerning, substitutions, ligatures, and
> even crazy stuff like randomizing the glyph used for each character to
> get a more hand-written appearance. We don't need to know about
> ligatures, we just let the font do it. No Lisp callbacks, just a call
> to harfbuzz.

I think this is a simplistic view of how the display engine works, and
I don't see how it could work in production while supporting all the
use cases we already do.  I could be wrong, though, so I'm looking
forward to see you present a series of patches that do support the
existing use cases and the ligatures as well, and don't cause any
slowdown in redisplay.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]