[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywh

From: Pip Cet
Subject: Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
Date: Tue, 26 May 2020 18:13:55 +0000

On Sun, May 24, 2020 at 3:33 PM Eli Zaretskii <address@hidden> wrote:
> > From: Pip Cet <address@hidden>
> > Date: Sat, 23 May 2020 22:38:18 +0000
> > Cc: address@hidden, address@hidden, address@hidden
> >
> > On Sat, May 23, 2020 at 4:34 PM Eli Zaretskii <address@hidden> wrote:
> > > > From: Pip Cet <address@hidden>
> > > > Date: Sat, 23 May 2020 15:13:38 +0000
> > > > Cc: address@hidden, address@hidden, address@hidden
> > > > Because what the current layout code does by default is to break
> > > > along any glyph boundary, and I don't see how that's broken in any
> > > > way.
> > >
> > > The code assumes that breaking on some glyph leaves the buffer
> > > iterator ('struct it') in a state that we can simply continue to the
> > > next buffer position.
> >
> > Yes. I see no reason to change that.
> >
> > > But if you already picked up several characters
> > > via look-ahead, that is not true, and you will have to return back
> > > several character positions, in order to continue on the next screen
> > > line.
> >
> > You're describing why look-ahead is difficult: a while ago, you
> > appeared to be saying it wasn't. This confuses me.
> >
> > Obviously, when I say "look-ahead", I mean receiving the next display
> > elements an iterator would produce if it were actually advanced,
> > without advancing it.
> That's not what you said earlier:

I think it is what I said.

> > > > > > You write: "(b) is not really feasible without redesigning the 
> > > > > > entire
> > > > > > Emacs display engine". I don't see how that's true at all. All we 
> > > > > > need
> > > > > > is some limited look-ahead.
> > > > >
> > > > > We already have look-ahead: that's what the regexp part of the
> > > > > composition rules are about.  That is not the crucial problem.
> > > >
> > > > But it's the only problem I see!
> > >
> > > Then maybe I don't understand what you mean by look-ahead.  Is that
> > > the decision how to choose those 32 characters of "context"?
> >
> > Yes.
> Here you said that look-ahead means how to _choose_ the context.

The distinction escapes me: look-ahead is how to get the context for a
character, obviously without ruining any persistent state. I'm puzzled
as to what else it could have meant.

> > > If we want the shaper to handle all the text we display,
> >
> > Do we? A while back you said Lisp control over compositions was an
> > important feature, and I'm inclined to think we shouldn't break the
> > existing composition code.
> >
> > > we should go all the way and do it for any text, ASCII, non-ASCII,
> > > symbols, emoji, everything.
> >
> > Are you suggesting I'm somehow limiting myself to ASCII? Let me assure
> > you that's not the case.
> Then I really don't understand what problem are you trying to solve.

Ligatures and kerning.

> Let's try again from the beginning: which parts of the code that
> implements automatic compositions are you trying to avoid,
> and why?

I'm not trying to avoid any of it! I just see no reason to use any of
it, so far, because the part we have in common is about a dozen lines
of code around the call to hb_shape.

> Is that the part that identifies the "context" via regular
> expressions?  If so, then this problem needs to be solved by some
> alternative; using an arbitrary chosen fixed number of characters is
> not suitable for production.

I'm puzzled as to how these regular expressions, which only work when
they match fixed-length strings, as far as I can tell, are worse than
a fixed-length context. You're right that the number shouldn't be
hardcoded in Emacs, and shouldn't be arbitrary, but obviously there
has to be a limit shorter than a word or paragraph. (The composite.c
code currently hardcodes a limit of 500 characters).

(And as I've said repeatedly, this is a deficiency specifically in
HarfBuzz: the OpenType format makes it very easy to tell what the
longest pattern is and how much context is needed. HarfBuzz should
pass on that information, ideally by providing an incremental
asynchronous API that requests only as much context as is needed until
the glyphs in question can be returned.)

> You haven't yet shown any viable alternative.

To what? We still haven't seen any actual regular expressions that
work. You just keep saying "regular expressions" like that's a
solution, rather than simply constituting a restriction on the set of
possible solutions.

And keep in mind that this context is used only for deciding what the
"current" glyph looks like: the next glyph will have its own context,
which might or might not be different.

What I'm currently playing with is something that I'm not sure is even
expressible as a regexp: starting with the character at point, keep
adding surrounding characters unless doing so would create a
delimiter-nondelimiter boundary after the first char, or a
nondelimiter-delimiter boundary before the last char, but limit the
whole thing to 16 characters each way.

As I've explained, it would be much better to let HarfBuzz tell us
whether to provide more context, but even then we'd need a cut-off:
imagine a file containing a gigabyte of 'f's.

> Assuming that the alternative for selecting the "context" is found,
> and composite.c is augmented to apply it instead of the regexps, why
> not use the rest of the automatic composition code to produce the
> glyphs and display them?

I chose not to do that for a patch which I have stated repeatedly was
not in any way a finalized design, and I don't see any good reason to
do it for a real patch, either, so far.

(I'll be honest: I strongly suspect that the code is too slow, we know
it to be buggy, and it's simply too different from what I actually
want to benefit from sharing the code).

> The code which does that exists and works,

(I suspect: slowly)

> and is tested by years of use.

It's unusable for me in Emacs 26.3.

> It already solves the problems of look-ahead,

If it does so efficiently, I'll certainly try reusing that code. But I
strongly suspect it doesn't.

> of wrapping long lines,

Very poorly, for my purposes.

> and others, including (but not limited to) the dreaded bidi thing.

Looking for "bidi" in composite.c, the only relevant thing I see is a FIXME.

> Why reinvent that wheel when we already have it, and it works well?

First, because it doesn't work that well for my purposes; second,
precisely because it works well for the purposes of others, and I'd
like to have as little impact as possible on existing use cases. They
should just continue working, and so far they do.

> > > and on top of that solve only a small part of the
> > > underlying problem.
> >
> > Ligatures and kerning (right now, for LTR text). Is that a small
> > problem because of the lack of RTL support?
> Yes, of course.

Why? I honestly don't see what's bad about a patch that improves
things for most languages and doesn't affect RTL languages (which, as
you point out, have existing support).

The code shouldn't break horribly for RTL text (it doesn't). If it
works, that's great; if it doesn't work and leaves things unshaped,
that's the existing behavior, and auto-composition-mode will still
work if enabled.

> An acceptable solution should support any text Emacs
> supports.

By that standard, bidi.c and composite.c are unacceptable.

> What's more, we already have the code which implements all
> that, so I don't understand why you want to bypass it.

We have something that superficially results in a similar screen
layout to what I want, but that actually represents display elements
in a way that makes them unusable for my purposes.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]