[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Website translations with Haunt

From: pelzflorian (Florian Pelz)
Subject: Re: Website translations with Haunt
Date: Sat, 16 Dec 2017 20:30:41 +0100
User-agent: NeoMutt/20171208

On Sat, Dec 16, 2017 at 10:26:12AM -0500, sirgazil wrote:
> I'm very interested on this subject because I help with Guile and Guix
> websites, and I usually work with multilingual websites. I have no idea of
> what would be the right way to do i18n of websites written in Scheme,
> though. So I will just join this conversation as a potential user of your
> solutions :)


> > I did not want to use the ordinary gettext functions in order to not
> > call setlocale very often to switch languages.  It seems the Gettext
> > system is not designed for rapidly changing locales, but maybe I am
> > wrong about this and very many setlocale calls would not be that bad.
> For what is worth, I use ordinary gettext and `setlocale` in my website,
> which is not Haunt-based, but it is Guile Scheme and statically generated
> too. So far, it works ok.

Performance is what motivated me to avoid repeated setlocale calls.
I now measured the impact of my approach and for my website, repeatedly
calling setlocale and gettext is actually slightly faster than
transforming a po file into an associative list and assoc-ref’ing the
list.  Only when using the same msgid very many times, transforming
the po file gets faster.

So it is probably best *not* to add ffi-helper to Haunt after all and
just use Gettext because while repeated setlocale is bad in theory, it
is faster in practice for normal websites and it does not really
matter much anyway.  Then again, for long running applications, not
using setlocale is better.

If you want detailed timings, read on, otherwise feel free to skip to
the end of this e-mail.

For my German and English website with the code at this is the result
of timing my current approach, which avoids repeated setlocale and
standard gettext calls but instead uses libgettextpo to create an
association list of msgids and msgstrs from the respective po files
(i.e. not from compiled mo files).

I put

 haunt build

inside a file called  I then ran

$ time ./
./  2.43s user 0.33s system 83% cpu 3.317 total
./  2.47s user 0.33s system 103% cpu 2.703 total
./  2.43s user 0.36s system 103% cpu 2.700 total
./  2.56s user 0.33s system 103% cpu 2.783 total

When instead not loading gettext-po, ffi-help-rt and Guile’s system
foreign modules, but running msgfmt to transform the po files to mo
files, moving them to ./de/LC_MESSAGES/ and just using
standard Gettext and setlocale with

(bindtextdomain "pelzfloriande" 
(bind-textdomain-codeset "pelzfloriande" "UTF-8")
(textdomain "pelzfloriande")

(define (locale-for-lingua lingua)
   '(("de" . "de_DE.UTF-8")
     ("en" . "en_US.UTF-8"))

(define (translated-msg msgid lingua)
    (setlocale LC_ALL (locale-for-lingua lingua))
    (gettext msgid)))

I got the following measurements and verified that the translation is
still working:

$ time ./       
building pages in 'site'...
copying asset 'css/common.css' → '/css/common.css'
./  2.01s user 0.29s system 102% cpu 2.241 total

For multiple runs:
./  2.01s user 0.29s system 102% cpu 2.241 total
./  2.06s user 0.31s system 102% cpu 2.302 total
./  2.15s user 0.33s system 104% cpu 2.387 total
./  1.99s user 0.32s system 102% cpu 2.246 total

When using setlocale but only when the lingua has changed from the
last call to _:

(define old-lingua "")

(define (translated-msg msgid lingua)
    (if (not (equal? old-lingua lingua))
          (setlocale LC_ALL (locale-for-lingua lingua))
          (set! old-lingua lingua)))
    (gettext msgid)))

./  2.10s user 0.32s system 103% cpu 2.332 total
./  2.03s user 0.31s system 102% cpu 2.283 total
./  2.11s user 0.36s system 102% cpu 2.408 total
./  2.05s user 0.30s system 102% cpu 2.296 total

When adding the following in a div:

,@(let loop ((i 0))
   (if (< i 10000)
        (_ "Home page")
        (loop (1+ i)))

and verifying it is correctly translated,
this is the result for my implementation:

./  4.48s user 0.33s system 89% cpu 5.356 total
./  4.52s user 0.36s system 102% cpu 4.737 total
./  4.44s user 0.38s system 104% cpu 4.619 total
./  4.49s user 0.40s system 103% cpu 4.735 total

With a setlocale call for each _:

./  4.46s user 0.36s system 101% cpu 4.736 total
./  4.64s user 0.39s system 103% cpu 4.875 total
./  4.65s user 0.33s system 103% cpu 4.838 total
./  4.66s user 0.33s system 103% cpu 4.833 total

This is the result for a cached setlocale call:

./  4.39s user 0.37s system 102% cpu 4.624 total
./  4.17s user 0.32s system 88% cpu 5.086 total
./  4.09s user 0.32s system 102% cpu 4.276 total
./  4.16s user 0.35s system 103% cpu 4.345 total

When adding the following in the div instead

,@(let loop ((i 0))
    (if (< i 10000)
         (let ((current-lingua "de"))
           (_ "Home page"))
          (let ((current-lingua "en"))
            (_ "Home page"))
          (loop (1+ i))))

this is my current implementation

./  6.36s user 0.36s system 99% cpu 6.733 total
./  6.34s user 0.34s system 103% cpu 6.470 total
./  6.00s user 0.39s system 103% cpu 6.195 total

this is without caching setlocale

./  8.74s user 0.38s system 101% cpu 8.986 total
./  8.70s user 0.36s system 102% cpu 8.872 total
./  8.86s user 0.40s system 99% cpu 9.300 total

this is with caching setlocale

./  8.95s user 0.37s system 93% cpu 9.979 total
./  8.60s user 0.39s system 103% cpu 8.712 total
./  8.81s user 0.34s system 95% cpu 9.581 total

In this contrived example, my implementation is faster.  Note that my
implementation may or may not be slower when not using the same
translation very often but instead using a longer PO file.

> For internationalization, I know the convention is to use _, but I don't
> like that, so I use the alias l10n instead.

We should definitely let the user define the syntax like in the Guile
manual.  If you want l10n, then use l10n, which is less confusing when
using _ for pattern matching.  But I will stick to _ for my website.

> For internationalizing complex blocks that should not be translated in
> fragments, like:
> `(p "Hi! I play "
>     (a (@ (href ,sport-url)) ,(l10n "futsal"))
>     " in "
>     (a (@ (href ,place-url)) ,(l10n "Tokyo")))
> I had to write a procedure I call `interleave` that I use like this:
> `(p
>   ,@(interleave (l10n "Hi! I play ~SPORT~ in ~PLACE~.")
>               `(a (@ (href ,sport-url)) ,(l10n "futsal"))
>               `(a (@ (href ,place-url)) ,(l10n "Tokyo"))))
> So, in the translation catalogs, translators will see the strings:
> "Hi! I play ~SPORT~ in ~PLACE~."
> "futsal"
> "Tokyo"

This interleaving is like a format string and is common in
applications, but it separates the value of ~SPORT~ from the context
in which it should be translated.  I prefer my approach with
multi-part translations with

   ,@(__ "This is a ||em_|multi-part translation||."
         `(("em_" .
            ,(lambda (text)
               `(em ,text)))))

> Currently, I use xgettext manually and Poedit for working with translation
> catalogs, but I'd like to manage translations in the future like this
> (replace `site` with `haunt`):
> # Create new translation catalogs for Finnish and Japanese.
> $ site catalog-new fi ja
> # Update translation catalogs with new translation strings.
> $ site catalog-update
> # Compile translation catalogs (generate .mo files)
> $ site catalog-compile

Yes.  This is a good user interface.  Maybe this should be part of the
haunt command and not require a build system after all…

> To be fully localized, I also have to pass IETF Language Tags around in the
> website code, so that I get the right content when rendering the templates
> in a given language.
> My 2¢

Yes, me too.  I wonder if this should be wrapped into custom syntax
maybe like the Guix store in G-expressions, but I’m not sure.


Attachment: signature.asc
Description: PGP signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]