bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati with Harfbu


From: Eli Zaretskii
Subject: bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati with Harfbuzz enabled (renders fine using m17n)
Date: Thu, 20 Dec 2018 20:58:06 +0200

Ping!  Could someone on the Harfbuzz team please comment on the
thoughts below?  Khaled, Mohammad, Behdad?

> Date: Mon, 17 Dec 2018 17:55:52 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: dr.khaled.hosny@gmail.com, behdad@behdad.org, 33729@debbugs.gnu.org,
>       far.nasiri.m@gmail.com, kaushal.modi@gmail.com
> 
> > From: Glenn Morris <rgm@gnu.org>
> > Cc: far.nasiri.m@gmail.com,  dr.khaled.hosny@gmail.com,  behdad@behdad.org, 
> >  33729@debbugs.gnu.org,  kaushal.modi@gmail.com
> > Date: Sun, 16 Dec 2018 19:30:00 -0500
> > 
> > > After some thinking, my conclusion is that we should import the
> > > ISO 15924 database from https://unicode.org/iso15924/, use a script
> > > similar to admin/unidata/blocks.awk to generate an alist from it that
> > > maps Emacs script names to ISO 15924 tags, and then access that alist
> > > from uni_script to get the correct script information to Harfbuzz.
> > >
> > > Patches implementing that are welcome.
> > 
> > I live to write awk scripts. I'm not 100% sure what you want, but as a
> > first example, the following takes
> > http://www.unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt
> > as input and outputs lines of the form "(gujr . gujarati)".
> > 
> > The aliases are so that the RHS matches charscript.el.
> > 
> > If this is not right, please clarify exactly what the inputs and output
> > should be.
> 
> Thanks.
> 
> It turns out I didn't have this figured out completely, and your
> proposal forced me to dig some more into the relevant parts of Unicode
> and Emacs.  I found a few additional issues and considerations; for at
> least some of them I'd like to hear the opinions of the Harfbuzz
> developers.
> 
> Here are the issues:
> 
>  . Contrary to my original thoughts, I now tend to think that a
>    separate char-table, say char-iso159240tag-table, that maps
>    character codepoints directly to the script tags, is a better
>    solution:
>     - it will allow a faster look up, obviously
>     - the subdivision of characters into scripts, as shown in
>       Unicode's Scripts.txt, is slightly different from what
>       char-script-table does, so a simple mapping from Emacs scripts
>       to ISO 15924 script tag will not do.  For example, many
>       characters Emacs puts into 'latin' or 'symbol' scripts are in
>       the Common script according to Scripts.txt, and similarly for
>       the Inherited script.  I imagine this is important for
>       Harfbuzz.
> 
>  . Whether to produce the character-to-script-tag mapping using the
>    UCD files, such as Scripts.txt and PropertyValueAliases.txt, or the
>    canonical ISO 15924 tags from https://unicode.org/iso15924/,
>    depends on whether the slight differences mentioned in
>    https://www.unicode.org/reports/tr24/#Relation_To_ISO15924 matter
>    for Harfbuzz.  For example, ISO 15924 has separate tags for the
>    Fraktur and Gaelic varieties of the Latin script: does this
>    distinction matter for Harfbuzz?
> 
>  . Does Harfbuzz handle the issues mentioned in
>    https://www.unicode.org/reports/tr24/#Script_Anomalies, and in
>    particular the use case of decomposed characters which yield a
>    different script than their precomposed variants?  This use case is
>    quite common in handling of character compositions, so it's
>    important to understand its implications before we decide on the
>    implementation.
> 
> To summarize, unless the Harfbuzz guys advise differently, I'd prefer
> processing Scripts.txt and PropertyValueAliases.txt into a list
> similar to the one we produce in charscript.el, then generate a
> char-table from that list.
> 
> Thanks again for working on this.
> 
> 
> 
> 





reply via email to

[Prev in Thread] Current Thread [Next in Thread]