groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: groff supports Italian input documents now


From: G. Branden Robinson
Subject: Re: groff supports Italian input documents now
Date: Sun, 4 Jul 2021 00:44:24 +1000
User-agent: NeoMutt/20180716

Hi, Ingo!

At 2021-07-03T16:01:34+0200, Ingo Schwarze wrote:
> Hi Branden,
>
[ a rare point of agreement snipped ;-) ]

> > I revamped groff input localization a few months ago.  It occurred
> > to me that the mechanism groff had innovated for this purpose
> > (specify options like -mfr for French) was duplicative of an
> > existing and much more widely understood infrastructure for tackling
> > such issues: locale(7).
> 
> That's not duplication at all but a totally different topic which
> has almost nothing to do with what we are talking about.
> 
> The locale(7) system is a systems for users to specify user
> preferences, for example which character set and encoding they
> want to use *when interacting with programs* and which language
> they want programs to use when displaying messages and when
> parsing user input.
> 
> That is not at all related to which macro set a document author
> decided to use for a document that the user wishes to process.
[snip]
> The reason you didn't is trivial: i missed your change...  :-(

It's possible some other people did, too, so thank you for raising it
for further discussion.

> > along the lines that just because a (for instance) French document
> > is being typeset, the user might not want to change their locale
> > to begin with "fr".
> 
> You have this argument backwards.

I don't think so.  I sometimes need to have a look at meintro_fr.me in
the groff distribution, so I am precisely this sort of user.  I
emphatically do not want Unix commands to start talking to me in French.

> I don't think "let's allow users to be lazy" is a good argument.

That's contextual.

> Instead, my point would be that you are abusing the locale system for
> the wrong purpose.

That may be so.

> > C. Instead of saying something like "groff -mit", we can use a standard
> >    environment variable to assert the locale.  For groff's purposes,
> >    simply "LANG=it" will suffice.
> 
> How is "LANG=it groff" better than "groff -mit"?
> 
> It is not shorter nor clearer.

One of my motivations was that the groff 1.22.4 documentation told
people that they had to be sure to specify the localization macro
package as the last -m argument on the command line.  This felt fragile
to me.

It is not clear to me now why this might be the case.  I've been doing
all kinds of janitorial robustification over the past few years, and
either that or changes before I started working on groff made the claim
inaccurate.

In any event, this is the sort of thing that can be sussed out with, you
guessed it, more automated tests.  (IIRC, groff 1.22.4 shipped with 4
tests; we're up to 74 now.  They've saved me from embarrassment many
times.)

> I can easily tell you how it is worse.
> 
>  - There is a risk that it inadvertently creeps in from the user's
>    environment even if the user never intended to set it.
>  - The roff ecosystem is famous for using pipelines, and making
>    sure that in a pipeline, the right programs run with the right
>    environment variables can be tricky and error-prone, whereas
>    setting command line options on programs in a pipeline is easy
>    and reliable.
>  - There is a risk that the environment variables [have] undesirable
>    and unintended side effects on some programs in the pipeline
>    because not all programs run in a roff pipeline must necessarily
>    be programs distributed with the respective core roff package.
>  - The LC_ variables are unreasonably powerful for this purpose
>    because they have never been designed for it.  The only decision
>    needed here is whether to run a macro package, and which one,
>    whereas the LC_ variables carry much more information.
>    Accepting and parsing irrelevant information and requiring
>    needlessly complicated syntax both cause complexity, which in
>    general increases the risk of both user confusion and program
>    misbehaviour and bugs.
>  - The LANG variable is considered a legacy feature, and advertising
>    legacy features is usually not a good idea.   Advertising a
>    more modern syntax like "LC_ALL=it_IT.UTF-8 groff" exacerbates
>    the previous problem, making the user wonder whether the "_IT"
>    part matters and what effect it might have, and whether ".UTF-8"
>    is the right choice and if so, whether ".UTF-8" here is sufficient
>    to assure correct processing of the character encoding in the
>    file - which it likely isn't.  The user might also wonder which
>    effect, if any, the LC_TIME and LC_NUMERIC features contained
>    in LC_ALL might have, and if those effects, if any, are beneficial
>    or detrimental, and whether it might be better to set one of the
>    other LC_* variables instead, and if so, which one.  It's not
>    readily apparent which of the variables to set because none of
>    them are designed for the purpose.

These are all fair points and I will chew on them, and would like to
solicit the views of others on this as well.

The LANG point is the weakest; I highlighted it in my mail only because
it was shorter and easier to type--laziness again.  I am aware of the
prescribed precedence of the POSIX locale-related environment variables.

> This is not an outright request of a revert, but an invitation
> to reconsider whether this is really a useful and desirable change.

I'm willing to back out this component of my localization changes; in
fact, I could do a straight revert of 12434c13[1] without causing any
auxiliary damage--I'd just need to add back in a
        .do mso en.tmac
line afterwards.

The change would be more disruptive to tests and documentation, but not
drastically so, and I trust that people have confidence in me to see to
those.  :)

By the way, I have pending a commit of a regression test for Savannah
#60874 (I'm composing this email as "make distcheck" runs), and it also
uses the $LANG mechanism.  Please don't interpret that as a
retrenchment.

Regards,
Branden

[1] 
https://git.savannah.gnu.org/cgit/groff.git/commit/?id=12434c13bec939a68e543e6aae758bc7e92c3fb0

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]