[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: LC_CTYPE implementation help
From: |
Bruno Haible |
Subject: |
Re: LC_CTYPE implementation help |
Date: |
Thu, 28 Aug 2008 01:31:56 +0200 |
User-agent: |
KMail/1.5.4 |
Aragon Gouveia wrote:
> So I take it this means that if one were writing a locale aware application,
> the application's ability to function predictability is very much upto the
> platform and system on which it runs? ie. one can't rely on just ensuring
> gettext is installed correctly...
Yes. gettext does not replace the system's locales. If you are on a system
with broken locales, then either you have a localedef command (like on
glibc or Solaris systems), or you are hosed (that's the case on most
other systems, including *BSD, Cygwin, mingw).
> I use FreeBSD primarily
You might want to try GNU/kFreeBSD instead: a glibc system with FreeBSD
kernel - and so it supports 'localedef'.
> > And be aware that the <ctype.h> functions are meaningless in multibyte
> > locales
>
> Does this apply to all systems? I use FreeBSD primarily, and their locales
> are named, for example, "ja_JP.UTF-8" - this makes me think the FreeBSD
> ctype functions will be multibyte aware...
FreeBSD <ctype.h> are certainly multibyte aware. But isalnum() is not
sufficient for testing whether 'ü' is a lower-case or upper-case letter
because often strlen("Ü") == 2.
> edit: just noticed FreeBSD has ctype functions like iswalnum() for handling
> "wide characters" and are declared in wctype.h. Cool! :)
Yes, mbtowc() + iswalnum() together are a working replacement for isalnum().
But I would not recommend to use functions which work on wide character
*strings* (wchar_t*) - doing so causes more problems that it solves. The
preferred representations for strings continue to be char* strings,
either in locale encoding (the default) or in UTF-8 encoding (see also
the unistr/u8* functions in gnulib).
Bruno