Re: Grammatical forms in translatable texts

bug-gettext

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Grammatical forms in translatable texts

From:	Akim Demaille
Subject:	Re: Grammatical forms in translatable texts
Date:	Mon, 20 Apr 2020 08:47:47 +0200

hi Bruno,

Thanks for your answers!

> Le 19 avr. 2020 à 16:26, Bruno Haible <address@hidden> a écrit :
> 
> Hi Akim,
> 
>> I hate reading
>> 
>>> $ cat foo.y
>>> %%
>>> $ bison foo.y
>>> foo.y:2.1: erreur: erreur de syntaxe, end of file inattendu
> 
> That's because the message is too terse. Not even quotes around
> 'end of file'.

No, what I truly dislike is the mixture of languages.  And I'm very
happy that there aren't forced quotes here.

> $ cat foo.y
> %token 
> %%
> $ LC_ALL=C /opt/local/bin/bison foo.y
> foo.y:2.1-2: error: syntax error, unexpected %%, expecting character literal 
> or identifier or <tag>
>     2 | %%
>       | ^~

(That was bison 3.5.)  That's what I would expected, not

> syntax error, unexpected "%%", expecting "character literal" or "identifier" 
> or "<tag>"


In French, we used to have (bison 3.5)

> $ /opt/local/bin/bison foo.y
> foo.y:2.1-2: erreur: erreur de syntaxe, %% inattendu, attendait character 
> literal ou identifier ou <tag>
>     2 | %%
>       | ^~

Now, with this beta, I have

> $ bison foo.y
> foo.y:2.1-2: erreur: attendait caractère littéral ou identificateur ou <tag> 
> avant %%
>     2 | %%
>       | ^~

which I prefer.  Note that I dislike quotes, but I'm somewhat
cheating: the components of the error message are styled.
Unfortunately we can't send images on these MLs, so "in text",
this is how it is style:

> foo.y:2.1-2: <error>erreur:</error> attendait <expected>caractère 
> littéral</expected> ou <expected>identificateur</expected> ou 
> <expected><tag></expected> avant <unexpected>%%</unexpected>
>     2 | <error>%%</error>
>       | <error>^~</error>


Side note.
Yes, I have changed the format used by bison itself, *ending* with
the culprit, rather that starting with it.  But I'm not sure I want
to keep this way, this is really an experimentation.  It particular
this format gives the feeling that the unexpected token could be
valid provided there's more stuff before, which is of course not
always the case.

> $ LC_ALL=C bison foo.y
> foo.y:2.1-2: error: expected character literal or identifier or <tag> before 
> %%
>     2 | %%
>       | ^~





>> I prefer incorrect French than Frenglish:
>> 
>>> $ bison foo.y
>>> foo.y:2.1: erreur: fin de fichier inattendu
> 
> But you would agree with me that
> 
>    foo.y:2.1: erreur: frontière de fichier inattendu
> 
> would be grammatically incorrect and thereby give an entirely
> wrong meaning to the sentence.

Well, "frontière" and "fin" are "feminine", so I think we speak about
the same thing.

Yes, I think it's wrong.

But it's less wrong than before.




>>> It is not nitpicking. A msgid "syntax error, unexpected %s", where
>>> a translatable string is plugged in for %s, violates the i18n principle
>>> "Entire sentences", documented at
>>> https://www.gnu.org/software/gettext/manual/html_node/Preparing-Strings.html

>> And it would be non acceptable to ask translators to address
>> all the possible cases
>> 
>> YYCASE_(0, YY_("syntax error"));
>> YYCASE_(1, YY_("syntax error, unexpected %s"));
>> YYCASE_(2, YY_("syntax error, unexpected %s, expecting %s"));
>> YYCASE_(3, YY_("syntax error, unexpected %s, expecting %s or %s"));
>> YYCASE_(4, YY_("syntax error, unexpected %s, expecting %s or %s or %s"));
>> YYCASE_(5, YY_("syntax error, unexpected %s, expecting %s or %s or %s or 
>> %s"));
> 
> Why would this be unacceptable to translators?
> YYCASE_(0, YY_("syntax error."));
> YYCASE_(1, YY_("syntax error.\nunexpected token: '%s'"));

Sorry, that was meant to be read with the context of your previous
answer, which I restored right above.  I was merely stating that
"yes, the i18n principle 'Entire sentences' is violated here, but
one would not want to address all the possible cases".

The 'sentence principle' says you can't have both a sentence, and
assembling pieces together, one must be off the table.  You throw
away the sentence, and so far, it still feels to costly to me.
I like that I have a sentence here.



> YYCASE_(2, YY_("syntax error.\nunexpected token: '%s'\nExpected token: 
> '%s'"));
> YYCASE_(3, YY_("syntax error.\nunexpected token: '%s'\nExpected token: '%s' 
> or '%s'"));
> YYCASE_(4, YY_("syntax error.\nunexpected token: '%s'\nExpected token: '%s' 
> or '%s' or '%s'"));
> YYCASE_(5, YY_("syntax error.\nunexpected token: '%s'\nExpected token: '%s' 
> or '%s' or '%s' or '%s'"));

This is sooo different from what compilers do!  Not to mention
that, for lack of a specific prefix, tools such as Emacs will
not be able to highlight this part as belonging to a diagnostic.

> $ gcc-mp-9 /tmp/foo.c
> /tmp/foo.c:1:5: error: expected identifier or '(' before '-' token
>     1 | int -;
>       |     ^
> $ clang-mp-9.0 /tmp/foo.c
> /tmp/foo.c:1:5: error: expected identifier or '('
> int -;
>     ^
> 1 error generated.

(both are wrong btw, there are plenty of things that could be there,
'*', 'unsigned', 'static', etc.)



>>> The general solution, that works for any language, is to relax on the
>>> requirement that the error message should be a sentence. It can look
>>> like a form. For example:
>>> 
>>>  Syntax error.
>>>  Unexpected token: %s
>>>  Expected one of the following tokens: %s, ...
>>> 
>>> This way it doesn't matter whether the string substituted for %s,
>>> "kyrillischer Buchstabe", is a masculinum or neutrum, and how it would
>>> be declensed in a sentence.
>> 
>> But I'm not sure I'd do that in Bison itself.

(that's quite ambiguous: there's bison the program with its own diagnostics,
and the default diagnostics it builds in user programs.  I was referring
to the latter, which appears to be what you understood)

> Why not? Unlike the other half-"solutions", this one works for all
> languages.
> 
> You already put additional information about the errors in 'note:' lines;
> why would you insist that the "unexpected token" info and the "expected
> tokens" info would be on the same line, in the same sentence?

I feel uneasy changing the default format now.  It was inherited
from the early days of Bison, so why changing it right now, precisely
at the moment users are given the means to forge the error messages
the way they want.


In the case of bison, which has caret-diagnostics, the name of the culprit
seem even less important, it could also just not be displayed (as clang
did above).



Actually, there's one way to have both: %define parse.error form,
to request a different prebaked output format.   But I am still
reluctant to go multiline.

Cheers!

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Grammatical forms in translatable texts, Bruno Haible, 2020/04/19
- Re: Grammatical forms in translatable texts, Akim Demaille, 2020/04/19
  - Re: Grammatical forms in translatable texts, Bruno Haible, 2020/04/19
    - Re: Grammatical forms in translatable texts, Akim Demaille <=
    - Re: Grammatical forms in translatable texts, Frank Heckenbach, 2020/04/20
    - Re: Grammatical forms in translatable texts, Akim Demaille, 2020/04/22
    - Re: Grammatical forms in translatable texts, Frank Heckenbach, 2020/04/24
  - Re: Contexts, Bruno Haible, 2020/04/19
    - Re: Contexts, Frank Heckenbach, 2020/04/19

Prev by Date: [bug #57973] please add copyright and license notice to all sed scripts
Next by Date: Re: Grammatical forms in translatable texts
Previous by thread: Re: Grammatical forms in translatable texts
Next by thread: Re: Grammatical forms in translatable texts
Index(es):
- Date
- Thread