[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

groff and multilingual documents

From: G. Branden Robinson
Subject: groff and multilingual documents
Date: Sat, 10 Jul 2021 18:37:26 +1000
User-agent: NeoMutt/20180716

Hi folks,

I've now reverted the inspection of the LC_ALL and LANG environment
variables by the troffrc for determination of a default input document
language.  Thanks to Ingo, Dave, and James for their feedback.

I also have some happy news regarding multilingual input documents.

At 2021-07-06T12:10:17-0400, James K. Lowden wrote:
> Branden, this sounds like you're inspecting these variables directly.
> It would be better to use setlocale(3) for consistency with other
> applications.

Yes, but the groff language does not expose this C standard library
function, so that was difficult.  Using a shell command, like locale(1)
to get at equivalent functionality would have meant using unsafe

> To Ingo's point, though, he's right: the environment is not something
> we normally change on a per-application basis.  It defines what can't
> be defined otherwise, namely the encoding used by the terminal (or,
> more generally, by the UI).

My experience is slightly different; while LC_CTYPE to an incompatible
encoding is likely to have unhappy results, I've fruitfully played with
LC_COLLATE in the environment of individual shell command.

> As for what the right default is, IMO it's debatable.  It seems to me
> that it's reasonable to assume a user working in an Italian
> environment is normally working with Italian text, and it would be a
> service to him to assume as much unless told otherwise.

It may be that Italian users of *roff systems are pretty accustomed to
its English-oriented defaults.

> If that same user wants to change that default -- because he normally
> works with groff inputs in English (or Greek) -- he should be able to
> control the default (using troffrc, I guess) to set his own default
> options.

Yes; if we disregard the POSIX locale, troffrc is exactly the right
place to do this.  A side benefit of the reversion is that the shipping
troffrc is simpler, and it should be pretty easy for users of other
supported languages (cs, de, fr, it, ja, sv, zh) to locate the "en" in
"en.tmac" and change it.

At 2021-07-05T08:36:56-0500, Dave Kemper wrote:
> I agree with Ingo's point that the document author is in the best
> position to know which language-specific macro package is required to
> format the document correctly.
> However, this argues against requiring the end user to either have a
> specific locale setting or need to supply specific command-line
> switches, and in favor of this information being encoded into the
> document itself.
> To that end, it seems we ought to be steering authors toward including
> appropriate .mso requests within their documents.  This would allow
> the output to be correct regardless of the end user's environment or
> command invocation.

I had thought this wouldn't work well; on the contrary, it does, and
even in groff 1.22.4.  It appears our documentation was stale.  With a
recent change I've made, we can even load localization files in
compatibility mode[1].

At 2021-07-05T16:18:17+0200, Ingo Schwarze wrote:
> Dave Kemper wrote on Mon, Jul 05, 2021 at 08:36:56AM -0500:
> > To that end, it seems we ought to be steering authors toward
> > including appropriate .mso requests within their documents.  This
> > would allow the output to be correct regardless of the end user's
> > environment or command invocation.
> This is likely a useful recommendation - except for manual pages,
> of course, which should not use low-level roff(7) requests and where
> some tools (for example mandoc) will deny .mso for security reasons.

I agree that .mso requests should not be necessary in localized man
pages; man(7) exposes no strings that require localization, and other
localization settings are of relatively low importance.  Plenty of
people hate and disable hyphenation in man pages, and the remaining
localization question is the matter of additional inter-sentence space,
which is a hair-splitting issue.

I hope it is not too much to ask man(1) implementors to add a
`-m$LOCALIZATION_PACKAGE` to the constructed groff command when a
localized man page is being rendered, so that automatic hyphenation
breaks are correct.  A man(1) command already must know when a localized
page is being rendered, and man-db man(1) already uses the system locale
to localize them.  There is a feature gap when man(1) is asked to render
an arbitrary file as a man page--the `man -l $FILE` feature.  Since
there's no way AFAICS to tell man(1) to load an arbitrary macro package,
this method of previewing a non-English page will not work correctly.
I'm not sure such a feature is worth adding when man page translators
can be taught about `groff -man | less -R`.

Back to the subject of general *roff documents, I'm attaching an example
of a multilingual input file.  It works just fine and provokes no
diagnostics, even with `-ww`.  It uses the .mso approach.

I fear the example is a bit long to include in our Texinfo manual.  I'm
open to suggestions of a better place for it.



Attachment: multilang.groff
Description: Text document

Description: PostScript document

Attachment: signature.asc
Description: PGP signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]