bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati with Harfbu


From: Khaled Hosny
Subject: bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati with Harfbuzz enabled (renders fine using m17n)
Date: Sat, 22 Dec 2018 10:54:48 +0200
User-agent: Mutt/1.11.1 (2018-12-01)

On Mon, Dec 17, 2018 at 05:55:52PM +0200, Eli Zaretskii wrote:
> > From: Glenn Morris <rgm@gnu.org>
> > Cc: far.nasiri.m@gmail.com,  dr.khaled.hosny@gmail.com,  behdad@behdad.org, 
> >  33729@debbugs.gnu.org,  kaushal.modi@gmail.com
> > Date: Sun, 16 Dec 2018 19:30:00 -0500
> > 
> > > After some thinking, my conclusion is that we should import the
> > > ISO 15924 database from https://unicode.org/iso15924/, use a script
> > > similar to admin/unidata/blocks.awk to generate an alist from it that
> > > maps Emacs script names to ISO 15924 tags, and then access that alist
> > > from uni_script to get the correct script information to Harfbuzz.
> > >
> > > Patches implementing that are welcome.
> > 
> > I live to write awk scripts. I'm not 100% sure what you want, but as a
> > first example, the following takes
> > http://www.unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt
> > as input and outputs lines of the form "(gujr . gujarati)".
> > 
> > The aliases are so that the RHS matches charscript.el.
> > 
> > If this is not right, please clarify exactly what the inputs and output
> > should be.
> 
> Thanks.
> 
> It turns out I didn't have this figured out completely, and your
> proposal forced me to dig some more into the relevant parts of Unicode
> and Emacs.  I found a few additional issues and considerations; for at
> least some of them I'd like to hear the opinions of the Harfbuzz
> developers.
> 
> Here are the issues:
> 
>  . Contrary to my original thoughts, I now tend to think that a
>    separate char-table, say char-iso159240tag-table, that maps
>    character codepoints directly to the script tags, is a better
>    solution:
>     - it will allow a faster look up, obviously
>     - the subdivision of characters into scripts, as shown in
>       Unicode's Scripts.txt, is slightly different from what
>       char-script-table does, so a simple mapping from Emacs scripts
>       to ISO 15924 script tag will not do.  For example, many
>       characters Emacs puts into 'latin' or 'symbol' scripts are in
>       the Common script according to Scripts.txt, and similarly for
>       the Inherited script.  I imagine this is important for
>       Harfbuzz.

Alternatively, we could just use HarfBuzz’s own built in ucdn-based
Unicode function for this. The only reason for overriding this in Emacs
was to keep HarfBuzz and Emacs Unicode support in sync, but if we are
going to duplicate the Unicode script data then better use what HarfBuzz
has.

I’m going to try this now.

>  . Whether to produce the character-to-script-tag mapping using the
>    UCD files, such as Scripts.txt and PropertyValueAliases.txt, or the
>    canonical ISO 15924 tags from https://unicode.org/iso15924/,
>    depends on whether the slight differences mentioned in
>    https://www.unicode.org/reports/tr24/#Relation_To_ISO15924 matter
>    for Harfbuzz.  For example, ISO 15924 has separate tags for the
>    Fraktur and Gaelic varieties of the Latin script: does this
>    distinction matter for Harfbuzz?

We want the UCD data.

Regards,
Khaled





reply via email to

[Prev in Thread] Current Thread [Next in Thread]