bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#31315: wrong font encoding for fallback font


From: Werner LEMBERG
Subject: bug#31315: wrong font encoding for fallback font
Date: Thu, 03 May 2018 07:52:27 +0200 (CEST)

> If by "xft" you mean the part of the X libraries that supports the
> APIs used by xfont.c, then I think we are on the same page now.

OK.

>> While this is correct for other CJK encodings like GB, JIS, KSC, or
>> Big5, it is *not* true for GB18030.  This is *only* an encoding and
>> *not* a charset!  It is simply another representation of Unicode,
>> comparable to UTF-8 or UCS4.  There doesn't exist a single font
>> natively encoded in GB18030!  This encoding only exists to be
>> code-wise backward compatible with GB 2312.
> 
> Maybe so, but GB18030 is a Chinese encoding, and as such it behaves
> in Emacs as all the other Chinese encodings.

I know, and I agree.  BUT!  xft doesn't do what Emacs expects.  *Any*
font that covers the whole BMP (in particular, the whole CJK part of
it) gets a `GB18030' tag from xft.  In other words, the `Chinese'
property isn't in the font from the very beginning.[*]

> Emacs employs that logic for every charset it has defined, including
> Latin-2, for example: if text was decoded from an encoding which
> supports a particular charset, Emacs puts the corresponding
> 'charset' text property on the decoded text, and the machinery which
> selects the appropriate font tries first to find a font which
> supports that charset.  The idea is that users in a particular
> culture have certain distinct preferences wrt fonts, and that an
> encoding that supports a certain charset or culture provides a hint
> about those preferences.  This idea is very central in how Emacs
> selects fonts.

Being the FreeType maintainer, and having co-developed Emacs's
internal buffer encoding scheme many, many years ago, I all know this.
I can only repeat that Emacs might tag a certain text with GB18030 so
that the user can deduce a Chinese origin.  However, there is *no*
guarantee that the user gets a Chinese-flavoured font – at least not
from the xft interface.[**]

As a corollary, it is fully sufficient for xft to handle GB18030 equal
to Unicode (i.e., `iso10646').


    Werner


[*] Actually, having Unicode fonts that provide CJK glyphs for the
    whole BMP completely spoils Emacs's font selection scheme based on
    charsets – as shown in one of my previous e-mails, xft provides
    all common CJK encodings for such fonts because Unicode is a
    superset of those encodings.

[**] If, say, the Pango font interface is used instead to access a
     modern CJK OpenType font, Emacs might request `script=hani,
     lang=ZHS' if it encounters GB18030 to resolve Unicode's Han
     unification, ensuring simplified Chinese glyph representation
     forms.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]