bug-gettext
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: GNU gettext 0.22 broke non-Unicode msgids


From: Bruno Haible
Subject: Re: GNU gettext 0.22 broke non-Unicode msgids
Date: Tue, 28 Nov 2023 07:44:47 +0100

Robert Clausecker wrote:
> > I made the change in gettext 0.22 because ISO-8859-1 is incompatible with
> > modern systems such as musl libc.
> 
> So because musl refuses to support non-Unicode (a reasonable design choice for
> them), you have to break it for anybody else.  Makes sense I guess.

The mitigation that consists in shipping two .mo files for every .po file, one
in the original encoding and one in UTF-8 for musl, would have caused 
significant
trouble to the maintainers of hundreds of packages.

> You could have avoided the issue by e.g. keeping msgids untranscoded.  I don't
> see any case where transcoding the msgid is the right call, as these need to
> exactly match what the source code has.  And the source code is not 
> transcoded by
> anything.

The "anybody else" is probably only your package.

* The recommendation to use only ASCII msgids is there for 22 years already.

* If someone uses a non-ASCII msgid, xgettext produces an error:

  $ LC_ALL=C xgettext -o - -k_ calc/calc.c
  xgettext: Non-ASCII string at calc/calc.c:121.
            Please specify the source encoding through --from-code.

* When this error is avoided by using --from-code, the result is a UTF-8 encoded
  PO file:

  $ LC_ALL=C xgettext -o - -k_ --from-code=ISO-8859-1 calc/calc.c
  # SOME DESCRIPTIVE TITLE.
  # Copyright (C) YEAR THE PACKAGE'S COPYRIGHT HOLDER
  # This file is distributed under the same license as the PACKAGE package.
  # FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
  #
  #, fuzzy
  msgid ""
  msgstr ""
  "Project-Id-Version: PACKAGE VERSION\n"
  "Report-Msgid-Bugs-To: \n"
  "POT-Creation-Date: 2023-11-28 07:36+0100\n"
  "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
  "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
  "Language-Team: LANGUAGE <LL@li.org>\n"
  "Language: \n"
  "MIME-Version: 1.0\n"
  "Content-Type: text/plain; charset=UTF-8\n"
  "Content-Transfer-Encoding: 8bit\n"

  #: calc/calc.c:121
  msgid "Jörg Schilling"
  msgstr ""

It is like this already at least since version 0.18 (from 2010).

You may call the latter behaviour a bug (since the direct gettext() call will
not find the translation). But it hasn't been reported since 2010. Apparently
no one else uses non-ASCII non-UTF-8 msgids in the wild, and no one else uses
hand-made PO files either.

Bruno






reply via email to

[Prev in Thread] Current Thread [Next in Thread]