[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: GNU gettext 0.22 broke non-Unicode msgids
From: |
Bruno Haible |
Subject: |
Re: GNU gettext 0.22 broke non-Unicode msgids |
Date: |
Tue, 28 Nov 2023 07:44:47 +0100 |
Robert Clausecker wrote:
> > I made the change in gettext 0.22 because ISO-8859-1 is incompatible with
> > modern systems such as musl libc.
>
> So because musl refuses to support non-Unicode (a reasonable design choice for
> them), you have to break it for anybody else. Makes sense I guess.
The mitigation that consists in shipping two .mo files for every .po file, one
in the original encoding and one in UTF-8 for musl, would have caused
significant
trouble to the maintainers of hundreds of packages.
> You could have avoided the issue by e.g. keeping msgids untranscoded. I don't
> see any case where transcoding the msgid is the right call, as these need to
> exactly match what the source code has. And the source code is not
> transcoded by
> anything.
The "anybody else" is probably only your package.
* The recommendation to use only ASCII msgids is there for 22 years already.
* If someone uses a non-ASCII msgid, xgettext produces an error:
$ LC_ALL=C xgettext -o - -k_ calc/calc.c
xgettext: Non-ASCII string at calc/calc.c:121.
Please specify the source encoding through --from-code.
* When this error is avoided by using --from-code, the result is a UTF-8 encoded
PO file:
$ LC_ALL=C xgettext -o - -k_ --from-code=ISO-8859-1 calc/calc.c
# SOME DESCRIPTIVE TITLE.
# Copyright (C) YEAR THE PACKAGE'S COPYRIGHT HOLDER
# This file is distributed under the same license as the PACKAGE package.
# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2023-11-28 07:36+0100\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"Language: \n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
#: calc/calc.c:121
msgid "Jörg Schilling"
msgstr ""
It is like this already at least since version 0.18 (from 2010).
You may call the latter behaviour a bug (since the direct gettext() call will
not find the translation). But it hasn't been reported since 2010. Apparently
no one else uses non-ASCII non-UTF-8 msgids in the wild, and no one else uses
hand-made PO files either.
Bruno