[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: On Translation Issues
From: |
Jean-Christophe Helary |
Subject: |
Re: On Translation Issues |
Date: |
Wed, 28 Feb 2024 23:43:15 +0000 |
Thank you for bringing me into your conversation.
To give some background, I am Jean-Christophe Helary, a French
professional translator who has been working on pushing free software
into professional translation for two decades, mostly by promoting and
contributing to OmegaT (https://omegat.org) for which I currently act
as project coordinator, but also, on a much lower level, the Okapi
Framework, Maxprograms, etc. I also try to work the other way round: by
pushing and promoting free professional computer aided translation
tools into free software translation processes (with much less success,
for practical reasons: most free software translation contributors do
that on the side and few are willing to invest time in learning tools
that professionals use).
I’ve been the maintainer of the French emacs manuals translation
project for a while now, which I’ve been rebooting during Covid. I’m
also active in the po4a translation project, etc.
> On Feb 29, 2024, at 7:15, Pádraig Brady <P@draigBrady.com> wrote:
> I see emacs recently discussed translating their texinfo manuals at:
> https://lists.gnu.org/archive/html/help-texinfo/2024-01/msg00057.html
> As it stands I only see one file for one language at:
> https://git.savannah.gnu.org/cgit/emacs.git/tree/doc/translations
The point here is having humans contributing to a project. You have not
failed to notice that Emacs is used around the world even without
having translations of being localized (beyond its tutorial).
Also, there is only one file for a French manual there, not because
there are no existing translations of the manuals, but rather because it
is precisely that French manual that triggered the discussion of “how
do we handle translations and how do we publish them?” which was kind
of the roadblock here. There are or have been translation efforts in at
least French, Japanese and Chinese and I have no doubt that once we
have modified the build process to install the various manuals, we’ll
have more projects slowly organizing.
The Emacs manuals are about 2 million words. Five hundred words a day
with a team of 10 committed people is about a year of work. Which is
nothing. Make it 20 people and you have that in 6 months. Doing that is
more than providing a translation, it is increasing people’s skills and
understanding of complex processes. It is creating a community of
people who understand issues of free software and are not mere consumers.
In one of the threads to which the discussion that you quoted belonged,
somebody noted that LibreOffice has a 6 million words manual that is
translated in a dozen languages. Of course, that’s because Star Office
started early, then Sun Microsystem pushed the effort further by
spending money on LSPs (already including some kind of reviewed machine
translation), etc., and then LibreOffice took over the existing
volunteer teams and they now have highly experienced people who handle
that. And because translation/localization is a very low friction entry
point into contributing to the community, that actually generates code
contributions (my own “main” contribution to Emacs is a fix to
packages.el because its output was full of single/plural errors for
corner cases).
> Now stepping back a bit, perhaps at this stage rather than
> persisting specific translations of specific snapshots of the docs,
> perhaps we should leverage the increasingly sophisticated translations
> provided by LLMs, to provide more up to date and varied translations.
This is a process/promotion/human issue. If the FSF decided to invest
money on translation management and promotion, I am sure lots of issues
that we have would go away.
Also, I’m sure that the GNU project would object to massively use LLM
outputs, considering that current LLMs are basically huge copyright
infringers and that they are just playing on the enormity of what they
did to get a free international pass. LLMs do not create communities.
They feed on communities and they are not accountable for the huge
externalities that they produce.
LLM output costs "nothing". Which means that individual users
already have access to that. In fact, I argued exactly that to the Linux
Foundation JA office yesterday. Providing LLM based translation is not
doing a service to users. It is also dangerous because LLM output is
strangely false in weird and unexpected places, and besides for a human
review service that I doubt the Gnu project would be willing to provide,
there is nothing that would keep those errors to be spread in the wild,
at a real cost that you can’t imagine.
LLMs do *not* provide “more up to date and varied translations”. They
provide “probable strings that they do not understand, but it looks
human enough that a human can be tricked into thinking that a human who
understands the subject matter actually wrote that”.
It would be nice to put down on paper what LLMs actually stand for and
discuss that before suggesting their wholesale use in the GNU project.
> It would be cool to integrate that seamlessly into the GNU info reader
> and/or online versions of the manual.
Based on which not-copyright infringing LLM?
> For illustration, ChatGPT gave this for the start of the ls manual:
I don't read Chinese but I can reverse translate that with some LLM
system. I guess it is good enough to understand what ls is about.
- Now do that to the whole page and send that to a professional Chinese
native computer user for comments.
- While you're at it, if you consider that the LLM output is good enough
for Chinese users, try to reverse translate that to English and ask
yourself if you'd find that acceptable in an official GNU manual.
- And also, why not use LLMs to actually produce manuals in English?
Would you support that?
What was the effort required for you to produce that output? What
additional benefit would such a “service” provide to users who already
have access to such services for free? I think those are valid questions.
If LLMs came at such a high cost that only institutions could access
their output, it would make some sense in some cases to provide such a
service. Also, LLM providers, and especially the one behind ChatGPT,
are currently engaged in a global environmental destruction project at
a time when we need to stop burning fossil fuel. So let’s please not
promote their use in places such as the GNU Project which has
diametrically opposed objectives in terms of human liberation.
--
Jean-Christophe Helary
@jchelary@emacs.ch
https://sr.ht/~brandelune/