bug#26396: 25.1; char-displayable-p on a latin1 tty

bug-gnu-emacs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#26396: 25.1; char-displayable-p on a latin1 tty

From:	Eli Zaretskii
Subject:	bug#26396: 25.1; char-displayable-p on a latin1 tty
Date:	Tue, 18 Apr 2017 21:19:35 +0300

> Cc: user42_kevin@yahoo.com.au, 26396@debbugs.gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Tue, 18 Apr 2017 10:49:37 -0700
> 
> On 04/17/2017 11:32 AM, Eli Zaretskii wrote:
> > Can you show an example of a character displayed in different forms
> > depending on buffer contents?  I'd like to look what the code does and why.
> 
> In master on the Linux console in non-UTF-8 mode and with a unibyte 
> en_US locale, if I run 'emacs -Q' and type 'C-x 8 RET 100 RET C-x 8 RET 
> 200 RET' the screen looks like this:
> 
> \u0100\u0200
> 
> If I then type 'C-x 8 RET 300 RET', the '\u0200' magically changes to 
> '?' and another '?' is appended, so that the screen then looks like this:
> 
> \u0100??

Yes, I see that, too.

> Presumably this is some sort of combining-character thing.

Yes.  Try "C-u C-x =" on the first '?', and you will see.  Or type
"M-x auto-composition-mode RET" to disable composition and get your
original characters back.

> However, if the intent is to present a combined character, shouldn't
> the character be displayed as a single '?', to better mimic the
> single glyph you'd see on an X display?

It probably should (if we want at all to allow compositions on text
terminals, which is questionable on non UTF-8 TTYs).

> By the way, the '?'s look like ordinary question marks; they are not 
> highlighted, as the \u0100 is. Shouldn't they be highlighted somehow? 

AFAIU, the '?' should not appear at all, as glyphless-char-display
specifies hex codes for those codepoints.  This is one of the
manifestations of the fact that glyphless-char-display doesn't work
correctly on TTY frames.  This code from term.c:

  else
    {
      Lisp_Object charset_list = FRAME_TERMINAL (it->f)->charset_list;

      if (char_charset (it->char_to_display, charset_list, NULL))
        {
          it->pixel_width = CHARACTER_WIDTH (it->char_to_display);
          it->nglyphs = it->pixel_width;
          if (it->glyph_row)
            append_glyph (it);
        }
      else
        {
          Lisp_Object acronym = lookup_glyphless_char_display (-1, it);

          eassert (it->what == IT_GLYPHLESS);
          produce_glyphless_glyph (it, acronym);
        }
    }

is weird, because the test in char_charset, which controls how such
characters will be displayed, makes little sense to me.  The idea was
to see if the character belongs to one of the charsets supported by
the terminal, but in practice this doesn't work.

> And while I have your ear, why is U+0700 SYRIAC END OF PARAGRAPH 
> displayed as an ordinary '?' while U+0500 CYRILLIC CAPITAL LETTER KOMI 
> DE is displayed as a highlighted '\u0500'?

AFAIU, that's a direct consequence of the above weird test.
Characters which fail the char_charset test are displayed via
produce_glyphless_glyph, which on a TTY produces \uNNNN, whereas
characters which pass the test are just appended verbatim to the
buffer that is then encoded by terminal-coding-system, and that
produces the question marks for unsupported characters, bypassing
glyphless-char-display.  That's the bug I'd like to fix.

[Prev in Thread]

Current Thread

[Next in Thread]

bug#26396: 25.1; char-displayable-p on a latin1 tty, (continued)

Prev by Date: bug#26552: [patch] nlinum margin width calculation (emacs <25)
Next by Date: bug#26539: emacs exits unexpectedly when editing po file using po-mode
Previous by thread: bug#26396: 25.1; char-displayable-p on a latin1 tty
Next by thread: bug#26396: 25.1; char-displayable-p on a latin1 tty
Index(es):
- Date
- Thread