bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: supporting obscure languages


From: Bruno Haible
Subject: Re: supporting obscure languages
Date: Sat, 28 Nov 2009 16:49:03 +0100
User-agent: KMail/1.9.9

Hello Albert,

> >> All I care about: LC_MESSAGES for "zam", LC_CTYPE not lobotomized
> >
> > Then your workaround of doing
> >  LANGUAGE=zam LC_ALL=fr_FR.UTF-8
> > is just fine.
> 
> Don't you think that is terribly gross? (French with
> different words!)

It is similar to LC_MESSAGES=zam_MX.UTF-8 LANG=fr_FR.UTF-8, which would
be a perfectly reasonable choice for a user with French preferences but
Zapotec language. POSIX allows users to combines different aspects of
locales in this way.

> Don't you think it's doubly gross to have a program
> calling setenv() to control a library via environment
> variables intended for users instead of a proper API?
> ... setenv as an API
> is really disturbing. I greatly prefer to treat the environment
> as read-only.

It is gross, but it is consequence of your desire to use a language
for which the locale is not existent or not installed, and therefore
to do in your program what normally the users do in their system. This is
not typical. The normal case is that users set their preferences in a
central location and these preferences get transmitted to the programs via
environment variables.

> The library doesn't even get immediate notice that there
> has been a change unless you have evil hooks into the
> setenv and getenv functions.

You don't have such hooks in the setlocale function either. Sadly.

> I'm depending on some random unrelated locale
> just to get normal UTF-8 behavior.

Yes, this is worrying. But nowadays, on most desktop systems, at least
one user locale is installed, it uses UTF-8 encoding, and you can
enquire it through   setlocale(LC_ALL,"").

The systems with only the "C" locale are small-memory devices like
routers.

> > No, when you call setlocale(LC_ALL,"") it uses the locale that the
> > user has set, not "C".
> 
> I mean when the user has done nothing either. The "" doesn't
> get filled in by some environment variable. You make it all the
> way to the lowest-priority environment variable ("LANG") and
> still have "". At that point, the implementation-specific locale
> is chosen... and it is "C".

If you are in this case, you are either on a misconfigured desktop
system, or on a small-memory system on which your program is likely
not meant to run.

> Basically: use what is there, and assume something close
> to "C.UTF-8" for anything missing/broken. Maybe you could
> find choices that are more generic than "C", like 24-hour time
> and PA4 paper size. Maybe round-trip the case for U+1E9E,
> avoiding expansion troubles. You could call it "default.UTF-8".
> 
> The details aren't terribly critical; the main thing is to let a
> random loose UTF-8 *.mo file work without hacks or fuss,
> along with the wchar_t functions working beyond ASCII.

Internationalization of a program consists of three parts:
  1) Make use of the Unicode character set.
  2) Provide translations for messages.
  3) Do the following in a locale dependent way: display of time,
     display of currency, computations with calendar, display of
     Hanzi ideographs (Chinese vs. Japanese - same Unicode code
     point, different glyphs), form for entering a postal address,
     arrangement of GUI components (right-to-left), etc.

With a "C" locale in UTF-8 encoding, you would get part 1). You would
not get part 2), because gettext() must not use the translation message
catalogs in the "C" locale. You would also not get part 3), because
strftime etc. also must not use localized values in the "C" locale.
That's because in POSIX, the "C" locale is the locale to be set when you
want to know ahead of time the output format of "ls", "df", "date" etc.

Conclusion: In general, a program cannot be internationalized if it
relies on the "C" locale.

Therefore only few program would profit from a "C" locale in UTF-8
encoding.

But I agree with you that it would be useful if more Linux distributors
would install an en_US.UTF-8 locale always.

Bruno




reply via email to

[Prev in Thread] Current Thread [Next in Thread]