[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: supporting obscure languages
From: |
Albert Cahalan |
Subject: |
Re: supporting obscure languages |
Date: |
Sat, 28 Nov 2009 14:53:26 -0500 |
On Sat, Nov 28, 2009 at 10:49 AM, Bruno Haible <address@hidden> wrote:
> It is similar to LC_MESSAGES=zam_MX.UTF-8 LANG=fr_FR.UTF-8, which would
> be a perfectly reasonable choice for a user with French preferences but
> Zapotec language. POSIX allows users to combines different aspects of
> locales in this way.
POSIX does, but the library does not. If the library followed POSIX
then I could combine LC_MESSAGES=zam with LANG=C.
In other words, this looks like a POSIX violation to me.
> It is gross, but it is consequence of your desire to use a language
> for which the locale is not existent or not installed, and therefore
> to do in your program what normally the users do in their system. This is
> not typical. The normal case is that users set their preferences in a
> central location and these preferences get transmitted to the programs via
> environment variables.
The only part I need is installed: zam.mo
Since I never try to format time, the library shouldn't even try
to load the data for that. The missing stuff shouldn't affect
anything since I'm not attempting to use it. Supposing I did
try to format time though, that could do some typical thing.
Basically this isn't fail-safe. Some chunk of locale data goes
missing, and suddenly the whole thing dies.
>> I'm depending on some random unrelated locale
>> just to get normal UTF-8 behavior.
>
> Yes, this is worrying. But nowadays, on most desktop systems, at least
> one user locale is installed, it uses UTF-8 encoding, and you can
> enquire it through setlocale(LC_ALL,"").
>
> The systems with only the "C" locale are small-memory devices like
> routers.
That was my system until I started debugging this problem,
and in fact an apt-get hook wipes out locales every time I
install packages.
This is because en_US.UTF-8 has defective collation order,
and because I don't normally need translations. If I were to
set either LANGUAGE or LC_MESSAGES alone though,
that ought to get me translations despite anything else.
> Internationalization of a program consists of three parts:
> 1) Make use of the Unicode character set.
> 2) Provide translations for messages.
> 3) Do the following in a locale dependent way: display of time,
> display of currency, computations with calendar, display of
> Hanzi ideographs (Chinese vs. Japanese - same Unicode code
> point, different glyphs), form for entering a postal address,
> arrangement of GUI components (right-to-left), etc.
Well no, not unless the program needs it. OTOH, Tux Paint
localizes things you don't even handle: audio clips, fonts,
font size, font vertical position, and right-to-left text rendering.
In any case, part of a locale is better than none. Right now
you're essentially saying that incomplete localization isn't
allowed; it's all or nothing.
> With a "C" locale in UTF-8 encoding, you would get part 1). You would
> not get part 2), because gettext() must not use the translation message
> catalogs in the "C" locale. You would also not get part 3), because
> strftime etc. also must not use localized values in the "C" locale.
> That's because in POSIX, the "C" locale is the locale to be set when you
> want to know ahead of time the output format of "ls", "df", "date" etc.
Ah, but I asked for a different locale.
LANGUAGE: not set to "C"
LC_ALL: not set to "C"
LC_MESSAGES: not set to "C"
LANG: not set to "C"
setlocale's 2nd parameter: not set to "C"
That right there means I didn't want the "C" locale. Additionally,
at least one of those things is not blank/empty/missing, so you
certainly know which locale I want. I expect best-effort.
I even called bind_textdomain_codeset, so UTF-8 is explicit.
Had I set nothing, I still wouldn't be asking for "C". You could
give me a "generic.UTF-8" or "NULL.UTF-8" locale that works.
BTW, even the strings being passed to gettext() are UTF-8.
I have things like the elipsis, so it's still UTF-8 even when the
translation is dumped on the floor.
> But I agree with you that it would be useful if more Linux distributors
> would install an en_US.UTF-8 locale always.
Debian seems to have chosen to add C.UTF-8. From my reading of
the code, it looks like that will fail. They'll patch it I'm sure.
- supporting obscure languages, Albert Cahalan, 2009/11/27
- Re: supporting obscure languages, Bruno Haible, 2009/11/27
- Re: German uppercasing rules (was: supporting obscure languages), Bruno Haible, 2009/11/28
- Re: German uppercasing rules (was: supporting obscure languages), Albert Cahalan, 2009/11/28
- Re: German uppercasing rules (was: supporting obscure languages), Bruno Haible, 2009/11/28
- Re: German uppercasing rules (was: supporting obscure languages), John Cowan, 2009/11/28