help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Composed Sequences


From: Eli Zaretskii
Subject: Re: Composed Sequences
Date: Sat, 26 Feb 2022 17:35:22 +0200

> Date: Sat, 26 Feb 2022 15:11:44 +0000
> From: Richard Wordingham <richard.wordingham@ntlworld.com>
> 
> > > Different renderers give different clusters, and thus, by default,
> > > different cursor motion!  
> 
> > Not "different renderers", but "different fonts".
> 
> I experimented with the Tai Tham composition-function-table entry
> 
> (list (vector "[\u1a20-\u1aad]+" 0 'font-shape-gstring))
> 
> For GNU Emacs 23.4.1 (i386-mingw-nt6.2.9200) using Uniscribe, the word
> ᨠᩣ᩠ᨿ <1A20 HIGH KA, 1A63 AA, 1A60 SAKOT, 1A3F LOW YA>, the glyph string
> for Version 0.8 of my font Da Lekh is divided into two
> clusters as identified by the 'glyph' values [0 1 6688...] [0 1
> 6688...] [2 3 6752...] and confirmed by ordinary cursor motion.  While
> this division into <1A20, 1A63> and <1A60, 1A3F> is not the Unicode
> division into grapheme clusters, it accords with what are natively
> namable clusters.
> 
> For GNU Emacs 27.1 (build1 i686-w64-mingw32) of 2020-08-21, which uses
> HarfBuzz, the same word is one indivisible cluster (at least with
> Version 0.13 of the same font).  I think this is a change in the
> behaviour of HarfBuzz.

If you must have the last word in this.  (It's quite clear that in
gray areas, such as Tai Tham, and where a shaping engine has a bug or
a misfeature, the results will also depend on the shaping engine.  But
that is not the main lesson to be taken home from the original issue,
which btw was with Arabic, not Tai Tham.)

> > > The reason Arabic seemed different is that when lam+hah appears to
> > > ligate, what is happening (at least with Amiri) is that
> > > substitutions are made which give the effect of a ligature, while
> > > remaining two distinct glyphs.  
> 
> > Yes, I see that as well.  "C-u C-x =" should tell you whether ligation
> > happened or not.  What you see is normal, I think: Emacs obeys the
> > decisions of the font designers.
> 
> Unless they recorded the positions of the boundaries between the parts
> of a ligature!

I don't understand what you mean by that.

Emacs behaves according to what the shaping engine tells us about the
number of graphems in the cluster.  Each grapheme is (by default) a
single unit for the purposes of cursor motion: Emacs will not let you
"enter" the grapheme, even if it is make out of several glyphs.  But
there's nothing in particular that Emacs expects from the number and
order of the graphemes in a cluster, we just use what the shaping
engine hands back to us.  And the cursor motion in Emacs is by default
in logical order, i.e. in the increasing order of buffer positions of
the original codepoints.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]