[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

More on Tibetan, or rather: ligatures

From: Oliver Corff
Subject: More on Tibetan, or rather: ligatures
Date: Sun, 21 Jan 2024 22:26:33 +0100
User-agent: Mozilla Thunderbird


Deri already followed up the conversation that was prompted by Tom's
questions regarding Tibetan.

I'll attempt to steer the conversation away from Tibetan towards a more
generic technical issue: processing ligatures (that's what Tom's
problems boil down to).

If we take the Tibetan syllable  རྒྱ, romanized as rgya, with the
components superscript r, baseform ga, subscript y, then what *looks*
like a single glyph is in reality a sequence of three (!) elements:

1. U+0F62 "RA" (but with the ability to change shape when combined; in
   contrast to U+0F6A which looks absolutely the same in the character
   table but does *not* enter into ligatures),
2. U+0F92 "-GA", i.e. subjoined form of base letter U+0F42, and finally
3. U+0Fb1 "-ya", subjoined form of U+0F61 YA.

All stacked vertically in one place. The same "TTT" (tiny Tibetan tower)
can have an additional layer on top (for the vowels e, i, o) or below
(for vowel u). Likewise, there is a base vowel sign for these four
(absent any of these, the vowel a is assumed), but the correct height of
the vowel glyph is taken care of by the font. It is also possible to
have one canonical vowel in the character table but a whole series of
vowel glyphs of different height in a private area of the font, not
necessarily user-accessible.

I haven't inspected the internal structure of the Tibetan fonts I use on
my machine, but the syllable rgya is displayed properly when copied into
a shell prompt, and e.g. in vim the key sequence g a reveals the
composition and the code points. So I assume the font does all the
shaping work, via its lookup tables.

Now the question which is not language-specific: In how far can groff
access these font-internal lookup tables? It appears that the "naive"
approach does not trigger the ligature mechanism in the font, as
demonstrated by Tom's and Deri's examples.

Is it possible that every \[u0Fxx] is (perhaps invisibly) isolated, akin
to putting every character in {f}{f}{l} if you want to make sure in TeX
that no ligature will spring into action?

I tried to test this hypothesis by making a minimal document, ff.roff

ff \" generates ligature in PDF file
\[u0066]\[u0066] \" I hoped to see something like ff, but get an error

Yet instead of producing the letter "f", \[u0066] generates an error
message:  "warning: special character '\f' not defined"

Where is my mistake?

I then tried the basic Latin range with other letters, like \[u0041],
but get the message: "warning: special character '\A' not defined"

Which looks as if the character code is translated correctly but the
backslash "special character" component is newly introduced.

Or is there a lower floor for the \[uxxxx] notation which I am not aware of?

So, when typesetting "ff" or "ffi" in groff, will groff build or not
build the ligature and request the glyph [ff] or [ffi] from the font, or
could the font do that based on its own knowledge of ligatures via the
appropriate lookup table?

In other words, for a working implementation of Tibetan in groff, should
I write a series conditional character substitutions, or is there a way
send the characters to the device in such a way that the device and font
know, here comes a ligature?

Either way I am fine - a) accessing the font lookup table, or b)
implement a comprehensive set of ligatures in groff.

Best regards,


Dr. Oliver Corff

reply via email to

[Prev in Thread] Current Thread [Next in Thread]