bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: gettext 0.10.37 msgfmt charset naming compatibility on Solaris 8


From: Bruno Haible
Subject: Re: gettext 0.10.37 msgfmt charset naming compatibility on Solaris 8
Date: Wed, 16 May 2001 18:00:59 +0200 (CEST)

Paul Eggert writes:
> After installing gettext 0.10.37 on Solaris 8 I ran into problems like this:
> 
> $ msgfmt es.po
> es.po: warning: Charset "ISO-8859-1" is not supported. msgfmt relies on 
> iconv(),
>               and iconv() does not support "ISO-8859-1".
>               Installing GNU libiconv and then reinstalling GNU gettext
>               would fix this problem.
>               Continuing anyway.
> 
> ...  Here's a patch.
> It's a bit of a hack, but it requires few changes to existing code.
> If you prefer it written some other way I can rewrite it.  It is handy
> to not have to worry about installing the extra package and getting
> everything linked up correctly.
>
> This patch does not affect behavior in the cases where GNU gettext
> currently succeeds; it is a pure extension.

Your patch works for you as long as you only use ISO-8859-1 encoded PO
files, and because Solaris has ISO-8859-1 locales (so that ISO-8859-1
occurs in the charset.alias file), and because the Solaris ISO-8859-1
converter is correct.

But I cannot take this patch for the following reasons:

  1) Why doesn't Solaris iconv() not accept "ISO-8859-1" as a
     conversion name? It's a name registered by ISO and IANA for many
     years now.

     Vendors whose iconv does not accept standard names are likely
     to also not put enough manpower in the converters themselves.
     For example, the Solaris 2.7 converters for ISO-8859-6,
     ISO-8859-7, ISO-8859-8, ISO-8859-10, ISO-8859-15, EUC-JP are
     incorrect. Only the ISO-8859-1, ISO-8859-2, ISO-8859-3,
     ISO-8859-4, ISO-8859-5, ISO-8859-9, KOI8-R, TIS-620 converters
     in Solaris 2.7 are correct. And Solaris is relatively good
     here; other iconvs are much worse.

     Soon gettext will use iconv for more than PO file parsing and
     line breaking. It is essential that the used iconv() obeys
     some quality standards.

  2) The charset.alias file is not sufficient for iconv. It only
     lists those charsets which are available as a locale's encoding.
     But GNU gettext is used by maintainers on systems which don't
     have all possible locales. (This is the primary reason why it
     uses iconv() and not mbrtowc() for the parsing of PO files.)
     Thus you would need a file listing the correspondence between
     GNU canonical charset name and vendor's iconv charset name,
     for each platform.

So if you want to use Solaris iconv with GNU gettext and without
warnings, a patch that I would accept would consist of the following:

1) An autoconf test which checks for each of the charsets listed in
   po.c
     a. under which name this charset is available,
     b. whether it converts according to the standard tables used
        by glibc and libiconv.
   Such a check shouldn't make the "configure" file three megabytes
   large, of course. I have more ideas on this step.

2) A wrapper function, 'iconv_open_wrapper' similar to your iconvariant
   function, which uses the autoconf test's results. In particular
   it should convert GNU canonical charset names to vendor names,
   and reject encodings for which the autoconf test has determined
   that the vendor's iconv is broken.

Note that this iconv wrapper would have to be used by intl/ as well,
not only by the lib/ and src/ part of GNU gettext.

Bruno



reply via email to

[Prev in Thread] Current Thread [Next in Thread]