bug-ncurses
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: I've made a simple program, showing UTF-8 lower but not uppercase wo


From: amores perros
Subject: Re: I've made a simple program, showing UTF-8 lower but not uppercase working
Date: Sat, 24 Sep 2005 16:28:01 +0000




From: Thomas Dickey To: amores perros
CC: mailingn list
Subject: Re: I've made a simple program, showing UTF-8 lower but not uppercase working
Date: Sat, 24 Sep 2005 05:14:28 -0400 (EDT)

On Sat, 24 Sep 2005, amores perros wrote:




Myself asking another probably very simple question.

I installed libncursesw5-dev, and changed my test program
by including ncursesw
#include <ncursesw/curses.h>

and linking to ncursesw
gcc ct.c -lncursesw

and it worked this time (that is, both lowercase and uppercase
characters in my simple test program came out correctly).

I did not add code to convert my string to wchar_t,
and change my calls to call widechar (cchar) versions.

I did not define _XOPEN_CURSES.

hmm - I think you meant _XOPEN_SOURCE_EXTENDED


Undoubtably you are correct.

If your application is simple enough, it works without that.
There are a few places where the difference between cchar_t and chtype
is exposed in curses.h (the WINDOW struct for example).  As long as
you use things like addstr() and printw(), it does not matter.


FWIW, I got curious and ran my entire application (which does use
WINDOWs, but no panels or forms), after adding an AC_LIB_CHECK
(or whatever it is) for ncurses to the configure.in, and adjusting
the include using HAVE_LIBNCURSESW to be <ncursesw/curses.h>
and that code kicked in, and the app worked, and the display
of capital non-ASCII letters is fixed, and some garbage that was
appearing on the screen many characters out to the right of
them is now gone.

I still did not define _XOPEN_SOURCE_EXTENDED, because I was
curious to see what would happen with just the link and header
change.

Perhaps my program now believes a wrong size for the WINDOW
structure, but I think I don't alloc them, I think I get them
(WINDOWs) from newwin, so perhaps it doesn't happen to
matter that my program believes something wrong about
WINDOW structure -- that is, perhaps it is compatible as
far as I use them. I mostly use them by passing them
to ncurses, so they're mostly opaque to me -- but I'm
not sure if I use any macros that are defined in the
headers (in which case it might matter to use the
correct header definitions, which might depend on that
symbol).

Obviously I can simply add that define to my code, unless you're
interested in any more datapoints from me about using
ncurses without defining that (apparently successfully -- but
with very little testing).


This is a difference between libncurses and libncursesw: the functions
that accept a char* string expect a multibyte character string (for example UTF-8) rather than 8-bit characters.


I hypothesize that the libncursesw code receives the strings from
me, notices that my ambient locale is UTF-8 (LC_CHAR, or LC_ALL,
or whatever is the crucial one), and immediately converts the
string to wchar_t (or cchar_t), and thereafter behaves much like
libncurses -- the crucial difference being that conversion of its
input as soon as it gets it (and its using larger internal structures
because they are sized with wchar_t).

Whereas libncurses assumes that the string of chars I gave it
is a string with each byte representing one char (which is not
true with UTF-8 of course).

But, because my terminal is in UTF-8, libncurses wound up
passing through some of the UTF-8 bytes unchanged to the
terminal, so I saw the lowercase non-ASCII characters come
out successfully, but for some reason the bytes corresponding
to uppercase non-ASCII got scrambled. (I'm curious why, although
it is not important that I know.)


<snip>

I do believe that I have come to understand this.

I think there is an ncurses FAQ.

I suggest an addition like so (or in whatever way you deem best):

* Does UTF-8 work with ncurses?

UTF-8 is not supported by the 8-bit (normal) configuration of libncurses.

However, UTF-8 is supported by the widechar configuration of ncurses, which
is called libncursesw.

The developer needs a widechar version of the headers and libraries, which
the developer may either build using the --enable-widechar switch, or fetch
from a package repository (eg, libncursesw5-dev in debian).

If the developer links against the shared library (libncursesw.5.so),
the enduser needs a widechar version of the shared library (eg, libncursesw
in Debian or Fink).

Naturally, if the developer links against the static library (libncurses.5.sa),
there is no requirement imposed on the enduser.

Note that the widechar variation, libncursesw, is functionally
a complete superset of the 8-bit variation, and supports
8-bit locales as well as the older libncurses, so there is a gain in
flexibility, but no loss, in migrating to the widechar variation.

****


The existing FAQ, at least in the form at
 http://dickey.his.com/ncurses/ncurses.faq.html
does allude to this, but in a question about line
drawing. My suggestion is to make the requirements
painfully explicit for the ignorant :)






reply via email to

[Prev in Thread] Current Thread [Next in Thread]