bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati with Harfbu


From: Eli Zaretskii
Subject: bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati with Harfbuzz enabled (renders fine using m17n)
Date: Mon, 17 Dec 2018 17:55:52 +0200

> From: Glenn Morris <rgm@gnu.org>
> Cc: far.nasiri.m@gmail.com,  dr.khaled.hosny@gmail.com,  behdad@behdad.org,  
> 33729@debbugs.gnu.org,  kaushal.modi@gmail.com
> Date: Sun, 16 Dec 2018 19:30:00 -0500
> 
> > After some thinking, my conclusion is that we should import the
> > ISO 15924 database from https://unicode.org/iso15924/, use a script
> > similar to admin/unidata/blocks.awk to generate an alist from it that
> > maps Emacs script names to ISO 15924 tags, and then access that alist
> > from uni_script to get the correct script information to Harfbuzz.
> >
> > Patches implementing that are welcome.
> 
> I live to write awk scripts. I'm not 100% sure what you want, but as a
> first example, the following takes
> http://www.unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt
> as input and outputs lines of the form "(gujr . gujarati)".
> 
> The aliases are so that the RHS matches charscript.el.
> 
> If this is not right, please clarify exactly what the inputs and output
> should be.

Thanks.

It turns out I didn't have this figured out completely, and your
proposal forced me to dig some more into the relevant parts of Unicode
and Emacs.  I found a few additional issues and considerations; for at
least some of them I'd like to hear the opinions of the Harfbuzz
developers.

Here are the issues:

 . Contrary to my original thoughts, I now tend to think that a
   separate char-table, say char-iso159240tag-table, that maps
   character codepoints directly to the script tags, is a better
   solution:
    - it will allow a faster look up, obviously
    - the subdivision of characters into scripts, as shown in
      Unicode's Scripts.txt, is slightly different from what
      char-script-table does, so a simple mapping from Emacs scripts
      to ISO 15924 script tag will not do.  For example, many
      characters Emacs puts into 'latin' or 'symbol' scripts are in
      the Common script according to Scripts.txt, and similarly for
      the Inherited script.  I imagine this is important for
      Harfbuzz.

 . Whether to produce the character-to-script-tag mapping using the
   UCD files, such as Scripts.txt and PropertyValueAliases.txt, or the
   canonical ISO 15924 tags from https://unicode.org/iso15924/,
   depends on whether the slight differences mentioned in
   https://www.unicode.org/reports/tr24/#Relation_To_ISO15924 matter
   for Harfbuzz.  For example, ISO 15924 has separate tags for the
   Fraktur and Gaelic varieties of the Latin script: does this
   distinction matter for Harfbuzz?

 . Does Harfbuzz handle the issues mentioned in
   https://www.unicode.org/reports/tr24/#Script_Anomalies, and in
   particular the use case of decomposed characters which yield a
   different script than their precomposed variants?  This use case is
   quite common in handling of character compositions, so it's
   important to understand its implications before we decide on the
   implementation.

To summarize, unless the Harfbuzz guys advise differently, I'd prefer
processing Scripts.txt and PropertyValueAliases.txt into a list
similar to the one we produce in charscript.el, then generate a
char-table from that list.

Thanks again for working on this.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]