[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: msggrep problems when priming bison-runtime's pump

From: Bruno Haible
Subject: Re: msggrep problems when priming bison-runtime's pump
Date: Mon, 25 Jul 2005 15:21:18 +0200
User-agent: KMail/1.5

Paul Eggert wrote:
> I wanted to extract from (say) po/et.po the subset of msgids that are
> mentioned in runtime-po/bison-runtime.pot.

You can do this through

   $ msgcat --more-than=1 po/et.po runtime-po/bison-runtime.pot


   $ msgcomm po/et.po runtime-po/bison-runtime.pot

The result of both commands is not the same. The first one is usually
better from a translator's point of view, the second one may be better
for a maintainer.

> The Gettext manual gives this as an example:
>   msggrep --location src/getopt.c -o compendium.po file.po
> So I tried this command:
>   msggrep --location runtime-po/bison-runtime.pot po/et.po

Err, the --location flag searches the #: part of the messages.
But the messages you are looking at have the line number info
   #: data/yacc.c:NN
   #: runtime-po/bison-runtime.pot:NN

> so I then generated the msgids by hand (there are only a few) and
> tried this:
>    msggrep -K 'memory exhausted' -K 'syntax error' po/et.po
> but this isn't the correct usage for msggrep.

Yup, it is hard for a command-line program to accept both basic and
extended regexps for 5 different roles. Here we stumble on limitations
of what can reasonably done with command-line options.

> This worked, except that I didn't want one of the 'syntax error'
> messages.  That is, of the following msgids extracted by that
> msggrep:
>    msgid "memory exhausted"
>    msgid "syntax error"
>    msgid "syntax error, unexpected %s"
>    msgid "syntax error, unexpected %s, expecting %s or %s or %s or %s"
>    msgid "syntax error, unexpected %s, expecting %s or %s or %s"
>    msgid "syntax error, unexpected %s, expecting %s or %s"
>    msgid "syntax error, unexpected %s, expecting %s"
>    msgid "syntax error: cannot back up"
>    msgid "syntax error; also memory exhausted"
> I didn't want the last one (since it's no longer in bison-runtime).
> However, I couldn't come up with a pattern to do that.  For example,
>    msggrep -K -E -e 'memory exhausted' -e '^syntax error($|[^;])' po/et.po
> still outputs that last msgid.

Yes, it does this because the last msgid matches the first pattern. The
different patterns are implicitly "or"ed together.

> Finally, msggrep outputs lots of messages like this:
> msggrep: warning: Locale charset "UTF-8" is different from
>                   input file charset "ISO-8859-15".
>                   Output of 'msggrep' might be incorrect.
>                   Possible workarounds are:
>                   - Set LC_ALL to a locale with encoding ISO-8859-15.
>                   - Convert the translation catalog to UTF-8 using
> 'msgconv', then apply 'msggrep',
>                     then convert back to ISO-8859-15 using 'msgconv'.
> These messages are alarming, and I don't think they apply here.

The warning indeed is not useful here, because the regexp that you
provided would yield the same results in ISO-8859-15 encoding as in
UTF-8 encoding. But other regexps like 'foo \(.\)\1' (as a basic regexp)
do not have this property.

The problem I've here with msggrep is:
  1) We don't have code that executes a regexp in an arbitrary encoding,
     if no locale for this encoding is present on the system.
  2) We don't have code that detects whether a regexp's result will be
     encoding dependent or not.

> At any rate, there should be a reliable way to do this little task
> without getting the warning, and without having to set LC_ALL to a
> different value for each catalog, in a catalog-dependent way.

Yes, I agree with you.

Ideas or code to fix this are welcome.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]