[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: groff supports Italian input documents now

From: Ingo Schwarze
Subject: Re: groff supports Italian input documents now
Date: Sat, 3 Jul 2021 16:01:34 +0200
User-agent: Mutt/1.12.2 (2019-09-21)

Hi Branden,

G. Branden Robinson wrote on Sat, Jul 03, 2021 at 12:50:07PM +1000:

[ autodetection ]
> Important to note here--it doesn't.  groff doesn't detect this--it has
> to be told.

Which is a good thing.  Even when a document contains only a single
language, detecting it automatically may not be reliable.  Even if
a document contains mostly text in one language, that doesn't imply
the author designed it for use with that language's macro set.

Relying on a specific macro set is a choice by the author of a
document and has to be treated as such.

> I revamped groff input localization a few months ago.  It occurred to me
> that the mechanism groff had innovated for this purpose (specify options
> like -mfr for French) was duplicative of an existing and much more
> widely understood infrastructure for tackling such issues: locale(7).

That's not duplication at all but a totally different topic which
has almost nothing to do with what we are talking about.

The locale(7) system is a systems for users to specify user
preferences, for example which character set and encoding they
want to use *when interacting with programs* and which language
they want programs to use when displaying messages and when
parsing user input.

That is not at all related to which macro set a document author
decided to use for a document that the user wishes to process.

For example, i almost always work with an en_US.UTF-8 locale with
some exceptions for low-level work where is use the POSIX locale
instead.  But that doesn't mean that i never want to process French
or German documents.  Yes, setting a fake locale when calling a
program is possible, so a *workaround* does exist, even though it
certainly feels awkward.

Besides, this is a bad trap.  Why should any user expect that whatever
locale they may have set according to their personal preferences
silently cripples formatting of documents they process, and that
they have to go an extra mile for modifying the locale in the
environment of their formatting commands?

> I have anticipated, but not yet heard, a protest

The reason you didn't is trivial: i missed your change...  :-(

> along the lines that just because a (for instance) French document
> is being typeset, the user might not want to change their locale
> to begin with "fr".

You have this argument backwards.

I don't think "let's allow users to be lazy" is a good argument.
Instead, my point would be that you are abusing the locale system
for the wrong purpose.

> C. Instead of saying something like "groff -mit", we can use a standard
>    environment variable to assert the locale.  For groff's purposes,
>    simply "LANG=it" will suffice.

How is "LANG=it groff" better than "groff -mit"?

It is not shorter nor clearer.

I can easily tell you how it is worse.

 - There is a risk that it inadvertently creeps in from the user's
   environment even if the user never intended to set it.
 - The roff ecosystem is famous for using pipelines, and making
   sure that in a pipeline, the right programs run with the right
   environment variables can be tricky and error-prone, whereas
   setting command line options on programs in a pipeline is easy
   and reliable.
 - There is a risk that the environment variables habe undesirable
   and unintended side effects on some programs in the pipeline
   because not all programs run in a roff pipeline must necessarily
   be programs distributed with the respective core roff package.
 - The LC_ variables are unreasonably powerful for this purpose
   because they have never been designed for it.  The only decision
   needed here is whether to run a macro package, and which one,
   whereas the LC_ variables carry much more information.
   Accepting and parsing irrelevant information and requiring
   needlessly complicated syntax both cause complexity, which in
   general increases the risk of both user confusion and program
   misbehaviour and bugs.
 - The LANG variable is considered a legacy feature, and advertising
   legacy features is usually not a good idea.   Advertising a
   more modern syntax like "LC_ALL=it_IT.UTF-8 groff" exacerbates
   the previous problem, making the user wonder whether the "_IT"
   part matters and what effect it might have, and whether ".UTF-8"
   is the right choice and if so, whether ".UTF-8" here is sufficient
   to assure correct processing of the character encoding in the
   file - which it likely isn't.  The user might also wonder which
   effect, if any, the LC_TIME and LC_NUMERIC features contained
   in LC_ALL might have, and if those effects, if any, are beneficial
   or detrimental, and whether it might be better to set one of the
   other LC_* variables instead, and if so, which one.  It's not
   readily apparent which of the variables to set because none of
   them are designed for the purpose.

This is not an outright request of a revert, but an invitation
to reconsider whether this is really a useful and desirable change.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]