bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#26396: 25.1; char-displayable-p on a latin1 tty


From: Paul Eggert
Subject: bug#26396: 25.1; char-displayable-p on a latin1 tty
Date: Fri, 14 Apr 2017 11:56:32 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.0

On 04/14/2017 05:37 AM, Eli Zaretskii wrote:
This should not be a problem, as the Linux console has only single-width characters.
Are you sure?  AFAIU, the Linux console supports the BMP, and some of
the characters in the BMP are double-width (a.k.a. "full-width"), for
example U+1100, U+231A, U+2B1B, and others.  What does the Linux
console do when these characters are sent to the screen driver?

I haven't experimented with it, so I'm not 100% sure. However, as I understand the implementation, the console driver can support at most 512 simultaneously-displayable characters, as this is a property of the classic IBM VGA design that is the greatest common denominator of current or recent (post-1990) PC graphics hardware. The user can specify what each character looks like down to the pixel level, but cannot alter character sizes on a character-by-character basis. In theory one could display double-wide characters by splitting them into halves and displaying each half separately, but I don't know of anyone who does that (it would not be practical due to that 512 limit).


And what does "display as-is" means in practice?  Should we send to
the console the glyph codes corresponding to Unicode points, or should
we send UTF-8 encoded characters?
It depends on whether the console is in UTF-8 mode. If so, send UTF-8;
if not, send a byte that is transformed according to the current mapping
table into a Unicode value. I hope we don't need to bother with the
latter possibility.
What software puts the console in UTF-8 mode?  Is that the locale
setting?

It's done at boot time. The escape sequences ESC % G (or ESC % 8) and ESC % @ get you into and out of UTF-8 mode; see <http://man7.org/linux/man-pages/man4/console_codes.4.html>. Common practice is to stay in UTF-8 mode as the alternative is worse (it has only 256 simultaneously-displayable characters).

http://www.tldp.org/LDP/LG/issue91/loozzr.html
http://man7.org/linux/man-pages/man4/console_codes.4.html
that seems to be just the tip of an iceberg.  Or maybe the
issue is easier than I envisioned.

Both, I hope. :-)

Suppose we only wanted to use this feature for UTF-8 locales.
Assuming that the OS takes care of putting the console in UTF-8 mode,
we don't need any changes in Emacs, since Emacs already sends UTF-8
sequences to the screen driver.  As the Linux console only supports
the BMP, we could then simply amend the code of char-displayable-p to
check whether a character is within the BMP, when the terminal is the
Linux console.  Do you agree with this conclusion?

No, because a character is displayable only if it's in that set of at-most-512 characters.

OTOH, now I'm not sure I understand the need for terminal_glyph_code.
What does it do that a simple check for a Linux console and UTF-8
terminal encoding, plus a character being inside a BMP, don't?

terminal_glyph_code gets the current set of at-most-512 displayable characters from from the kernel.






reply via email to

[Prev in Thread] Current Thread [Next in Thread]