bug-ncurses
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: extended ASCII characters do not show up


From: amores perros
Subject: Re: extended ASCII characters do not show up
Date: Sat, 01 Oct 2005 18:16:31 +0000




From: Thomas Dickey <address@hidden>
Subject: Re: extended ASCII characters do not show up
Date: Sat, 1 Oct 2005 12:34:19 -0400 (EDT)

<snip>

#2)
I don't understand how line drawing characters (such as AC_VLINE, I
think) work with UTF-8? That is, I don't know what they could
expand to that would work, unless they (macros I assume) expand
to characters between 0 and 0x20 which are not otherwise used.
I've looked in the ncurses faq for UTF-8, but if the answer is there
I overlooked it :(

yes (it did occur to me that I should add this to my faq - on my to-do list).

The ACS_xxx symbols are a character (which corresponds to the vt100 line-drawing), with A_ALTCHARSET added. ncurses keeps track of the A_ALTCHARSET, and when it is time to write the data to the screen, checks to see if the encoding is UTF-8. If so, it checks some special cases (such as Linux console) to see if it should not try to use the terminfo string to transform its internal character to the terminal's equivalent.

Its a little tough to follow looking at ncurses sources, as I think
these .in files will expand after some autotools, but
A_ALTCHARSET probably expands to  NCURSES_BITS(@cf_cv_1UL@,14),
and NCURSES_BITS probably expands via

#define NCURSES_BITS(mask,shift) ((mask) << ((shift) + NCURSES_ATTR_SHIFT))

so I think A_ALTCHARSET sets at least 14 bits up (and WA_ALTCHARSET
is the same thing as A_ALTCHARSET).

So, my assumption that, eg, ACS_VLINE is a char, was misfounded --
ACS_VLINE is an integer of 14+bits, and A_ALTCHARSET is probably some
high bit set to flag this integer as not being a character in a UTF-8 encoding.

So, now I understand how you can overload a "char" with information
outside of the UTF-8 range -- its not a char, but an integer -- much like
EOF, which uses a value outside of char range to mean the end of file
flag, I think.

At least, I think I understand it now.





For Linux console in UTF-8 mode, the line-drawing characters are all represented as 3 bytes in UTF-8 encoding. That isn't compatible with the terminfo acsc string (which always does 1 byte mapped to another 1 byte).

The table that ncurses uses for the UTF-8 line-drawing is in
        ncurses/widechar/lib_wacs.c
That lists the Unicode values such as
        { 'q',  { '-',  0x2500 }},      /* horizontal line */


I don't know what a "terminfo acsc string" is, but I think I'm
content with the level of my limited understanding now, and
thanks for pointing out where the table is, to see what unicode
characters are being used for drawing.



Thank you.

Cordially,

Perry






reply via email to

[Prev in Thread] Current Thread [Next in Thread]