[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#31315: wrong font encoding for fallback font

From: Werner LEMBERG
Subject: bug#31315: wrong font encoding for fallback font
Date: Tue, 01 May 2018 21:30:14 +0200 (CEST)

>> what matters is how the font backend provides the font to the
>> client.  Calling `xlsfonts' I see that X11 offers access as
>> follows.
>>   -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-cns11643-1
>>   -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-cns11643-2
>>   -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-cns11643-3
>>   -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-gb18030.2000-0
>>   -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-gb2312.1980-0
>>   -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-iso10646-1
>>   -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-jisx0201.1976-0
>>   -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-jisx0208.1983-0
>>   -misc-droid sans fallback-medium-r-normal--0-0-0-0-p-0-jisx0208.1990-0
> I think we have a terminology problem here, most probably my fault.
> What exactly do you mean when you say "font backend" in this
> context?  And what is "the client" in this case?

OK, sorry.  I mean the X11 font backend.  Here's my global picture.

          gb18030               unicode
 Emacs  ----------->   xft   ------------>  DroidSansFallback.ttf

For me, Emacs is a client of the xft font interface.  In our
particular case, xft provides `DroidSansFallback.ttf' to Emacs as a
font encoded in GB18030 – Emacs obviously has requested a font in this
encoding.  Behind the scenes, however, xft communicates with the
`DroidSansFallback.ttf' font using Unicode (the font has no other

> If you received a GB18030 encoded email, it is expected that Emacs
> will try to find a font that explicitly supports GB18030.
> This is a feature that AFAIU is very important to CJK users: they
> expect Emacs to select a font that declares support for the
> character's charset as set by the decoding machinery.

While this is correct for other CJK encodings like GB, JIS, KSC, or
Big5, it is *not* true for GB18030.  This is *only* an encoding and
*not* a charset!  It is simply another representation of Unicode,
comparable to UTF-8 or UCS4.  There doesn't exist a single font
natively encoded in GB18030!  This encoding only exists to be
code-wise backward compatible with GB 2312.

To a certain extent it is valid to assume that a user of GB18030
expects Chinese glyph representation forms for characters in the CJK
range.  However, since full Unicode is supported, this assumption is
rather weak.

The X11 interface is too old actually to handle GB18030 correctly.
For example, on my GNU/Linux box xft offers the following:

  -adobe-noto sans cjk jp thin-light-r-normal--0-0-0-0-p-0-gb18030.2000-0

As the `jp' in the name indicates this font contains Japanese glyph
representation forms.  Since `Noto Sans CJK' provides all CJK glyphs
in the BMP, xft happily tags it with GB18030...

>> > In general, the way to request that Emacs uses fonts you like
>> > with certain characters or charsets is by customizing your
>> > fontsets.  I cannot say more without hearing the details.
>> I don't have any fontsets customized in my `.emacs' file.
> Well, it sounds like you should.  Emacs chooses fonts using
> techniques that prefer speed to accuracy, and if that gives
> suboptimal results, the way to improve them is to guide Emacs by
> tailoring your fontset to the fonts you have installed and to the
> visual appearance you happen to like.

For the purpose of reporting this bug I thought it would be best to
not use further deviations of `emacs -Q'...

>> Both.  If I open a new file Unicode encoded file, Emacs continues
>> to use GB18030.2000 as the charset registry/encoding for displaying
>> fallback characters, failing to convert Unicode to GB18030 before
>> accessing the characters from the font backend.
> The former part is not a bug at all.

I agree.  I only wanted to tell you what I observe.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]