bug-gettext
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] python-brace-format: support conversion specifier and unname


From: Terence Honles
Subject: Re: [PATCH] python-brace-format: support conversion specifier and unnamed arguments
Date: Wed, 19 Apr 2023 17:12:14 +0200

Thanks for the review @Bruno,

>   So, IMO, what GNU gettext should do is:
>     - Translators should never see a format string with "{}".
>     - When a format string with "{}" occurs in the source code,
>       the xgettext program should NOT extract it but instead give a
>       warning to the programmer, telling him to insert numeric
>       or symbolic names for the arguments.
>   (Maybe a format string with a single "{}" can be accepted? Not
>   sure whether that makes sense.)

I understand that unnamed arguments are not really appropriate when using more
than one of them, but I am coming from Django and I don't believe the tooling
makes a warning like this obvious enough. We are aware we should not be using
more than one unnamed argument as that's the same for % formatted strings, but
we weren't really sure why the python-brace-format flag wasn't being put on the
messages. I had initially thought that there was no brace detection, and didn't
realize it was just a corner case that wasn't consistent with python-format.

I did update the code to print the warning as you suggested (or at least that's
what I believe that code does), but I believe it makes sense to let the string
be extracted and warn afterwards so let a conscious developer fix it, but if
the user wasn't going to fix it; this doesn't change the fact that the
translator will see a "{}" that needs to be copied over. By updating the code
to recognize unnamed arguments it can at least warn/error when generating the
mo file if they have been omitted or text has been inserted in that will then
crash the program by inserting new directives that were not expected.

Not extracting the python-brace-format would lose information compared to the
source. From the po file it is not possible to parse the message to figure out
if it is missing braces, but by providing that information other tooling can
check the string to preserve formatting characters. Letting those tools reject
a string that would be confusing seems to make more sense to me.

> * Problems with the conversion specifiers for translators:
>
>   As I understand it,
>     "{0!s}" with argument x
>   is equivalent to
>     "{0}" with argument x.str()
>   and similarly for the other conversion specifiers.
>
>   So, these conversion specifiers are actually a piece of code that the
>   programmer has chosen to put into the format string.
>
>   In the context of GNU gettext, this is not good because:
>     - The translator sees complexity in the msgids that are not his
>       business.
>     - When the programmer refactors the code and decides to remove the '!s'
>       part or introduce a different conversion specifier, the translators
>       will see a changed msgid and be prompted to translate it again.
>       Which is unnecessary work for the translator.
>
>   So, here too, GNU gettext should better not present this to the translator.
>   Instead, xgettext should give a warning and tell the programmer to move
>   the conversion outside of the format string.

While this is true, my argument for accepting this is the same. If xgettext
does not recognize conversion specifiers then the whole string is not
flagged as python-brace-format and unless the warning is seen by the
developer it may end up with a string that now is missing directives
that will crash the program. I agree a developer *could* move the formatting
character out of the string, but unless their editor alerts them of this
issue or they review the message files they will not realize that their
string may be translated incorrectly.

I understand the format is pretty simple, but I had, possibly incorrectly,
assumed that most people probably translated the message files with a tool.
Tools like Poedit (IIRC) and Transifex will use the flags python-format
and python-brace-format and make sure the format string is copied verbatim
into the translated text.

I understand that this is "complicating" the type of strings that a
translator is seeing, but regardless if the string is any of:
- "my message {} is here"
- "my message {val} is here"
- "my message {val!r} is here"

The consistent thing is to look for the "{" and then the closing "}" and
ignore everything else. I understand that there's different formats for
each language, and translators must have to get used to it to some degree,
but unless the po format was changed to normalize the formatting characters
between languages it's going to be a somewhat technical task for the
translators.

I find it harder to not support both of these features and hope
that the developer realizes that the flags are missing from the po files
or that they see a warning.

> Do you agree with this? Does it make sense? (You appear to be more of a
> Python expert than I am.)

I agree, it makes sense to make the translator's job as easy as possible,
but it needs to be easily discoverable and ideally it shouldn't appear
"broken". Emitting warnings during extraction also seems fine, but I'm
not sure if that's currently done for the other extractors, and I'm not
sure I agree it should omit the flag. Warnings should only warn, and the
extraction should continue to happen despite the warning. If the string
needs to be tended to then it should be an error and it should be obvious
that there is an issue. I believe this makes sense regardless of the
language, but I'm not completely sure what the precedent is and I
understand you are going to be more aware of it than I.

-Terence



reply via email to

[Prev in Thread] Current Thread [Next in Thread]