Re: Internationalize Emacs's messages (swahili)

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Internationalize Emacs's messages (swahili)

From:	Daniel Brooks
Subject:	Re: Internationalize Emacs's messages (swahili)
Date:	Sun, 27 Dec 2020 00:48:19 -0800
User-agent:	Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)

Richard Stallman <rms@gnu.org> writes:

> I think the idea of integrating Fluent with gettext is interesting.
> Would you like to study that possibility?

Yes, I have been considering it.

The biggest disconnect between Fluent and gettext is that Fluent allows
recursive substitutions and multiple substitutions per message. Even
just the fact that fluent handles the substitutions itself instead of
making the caller delegate to printf is a big difference. Each message
needs to give names to the values that will be substituted into it
(basically argument names), and the PO files would have to specify how
to use those arguments, whether it's subtituting them into the message
directly, or switching on their values.

The message catalog files (MO files) just have flat strings with no
notion of substitutions or function calls. This is why my first
inclination was to generate elisp code from a Fluent file.

The easiest way to mush gettext and fluent together is to put some
syntax into the messages that is post-processed before being returned to
the caller, turning it into an interpreter. Something like this in a PO
file:

msgid "-sync-brand-name"
msgstr "Firefox Account"

msgid "sync-signedout-title"
msgstr "Connect with your {-sync-brand-name}"

A hypothetical igettext function could look for the curly braces,
recurse to find the value of the -sync-brand-name message, perform the
substitution (which allocates a new string), and then return the
result. Or the value of sync-signedout-title could be precomputed before
it was stored in the MO file. For this simple example, either would
work.

But consider a more complicated scenario:

msgid "tabs-close-tooltip"
msgstr "{$tabCount ->
    [one] Zamknij kartę
    [few] Zamknij {$tabCount} karty
   *[many] Zamknij { $tabCount } kart
}"

Again igettext can process this after retrieving it from the MO file,
but again it's just turning it into an interpreter for a slightly lispy
language. It could be partially unrolled into the MO file by using up
multiple strings in the MO file, but it would still need to be an
interpreter to substitute in the tabCount value. Good translations
frequently have more complexity.

I'd rather it were compiled to elisp that can be byte-compiled and
hopefully jit-compiled before too long.

Also, note that the original language wants to have the same
substitution capabilities as the translations. To my mind it would be
really weird to embed those in the source of the program that uses
igettext. Consider the following hypothetical example:

(message (igettext "{$tabCount ->
    [one] Zamknij kartę
    [few] Zamknij {$tabCount} karty
   *[many] Zamknij { $tabCount } kart
}" 42))

The PO file would look something like this:

msgid "{$tabCount ->
    [one] Zamknij kartę
    [few] Zamknij {$tabCount} karty
   *[many] Zamknij { $tabCount } kart
}"
msgstr "{$tabCount ->
    [one] Close { $tabCount } tab
   *[many] Close { $tabCount } tabs
}"

This may be really weird for the translator, because it seems to imply
that the degree and type of abstractions used in the translation is
supposed to match that of the original text, which is not necessary at
all. Thus I have kept the Fluent convention of using simple textual
identifiers in the source code, which is a departure from the way
gettext is normally used.

My thoughts have gotten more organized as I wrote this up, so I
apologize if I've skipped any important deductive steps or otherwise
left anything unclear.

> It seems that Fluent is not self-contained but rather depends
> on the presence of an interpreter for JS, Python, or Rust.
> Is that correct?  That would be very undesirable in C programs
> that don't contain any interpreter (and don't need one).

Not quite. It's intended that various programs would either integrate
with an existing implementation, or implement the Fluent spec in their
own language where that's not convenient.

For example, a C program can link against the fluent-rs static library,
and thus avoid writing that code themselves. Of course depending on two
compilers (one for C and another for Rust, since fluent-rs is written in
Rust) is sometimes a deal-breaker.

I think we would ignore these existing implementations, except possibly
as a source of inspiration, and write our own.

> Is it feasible to write a small Fluent interpreter in C for this
> purpose?

Absolutely, but my personal preference is to write it in Elisp.

My second choice is actually to link against fluent-rs; Rust is a
great language and certainly better than C for implementing such
things. Of course that is it's own can of worms.

My third choice is to write an implementation in C. This we could
reasonably make a separate library we could share with anyone else
wanting to use Fluent in their C program. But then we would have to
write a lot of C.

db48x

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Internationalize Emacs's messages (swahili), (continued)

Prev by Date: Re: Internationalize Emacs's messages (swahili)
Next by Date: Re: Multi-OS Emacs buildbot?
Previous by thread: Re: Internationalize Emacs's messages (swahili)
Next by thread: Re: Internationalize Emacs's messages (swahili)
Index(es):
- Date
- Thread