Re: groff supports Italian input documents now

From: John Ankarström
Subject: Re: groff supports Italian input documents now
Date: Sun, 4 Jul 2021 00:04:05 +0200
Den 2021-07-03 kl. 04:50 skrev G. Branden Robinson:
> It seems that the EU has standardized on "no additional
> inter-sentence space" in its typography, so our Czech, German,
> French, Italian, and Swedish localization files all say .ss 12 0

I've always wondered about this. Does anyone know to what extent
"additional inter-sentence space" has been used in Europe prior to
this? I'm personally thinking about Swedish primarily, but it would
be interesting with a closer look at this with regards to any of the
European languages.

Den 2021-07-03 kl. 16:44 skrev G. Branden Robinson:
>>  - The LANG variable is considered a legacy feature, and advertising
>>    legacy features is usually not a good idea.   Advertising a
>>    more modern syntax like "LC_ALL=it_IT.UTF-8 groff" exacerbates
>>    the previous problem, making the user wonder whether the "_IT"
>>    part matters and what effect it might have, and whether ".UTF-8"
>>    is the right choice and if so, whether ".UTF-8" here is sufficient
>>    to assure correct processing of the character encoding in the
>>    file - which it likely isn't.  The user might also wonder which
>>    effect, if any, the LC_TIME and LC_NUMERIC features contained
>>    in LC_ALL might have, and if those effects, if any, are beneficial
>>    or detrimental, and whether it might be better to set one of the
>>    other LC_* variables instead, and if so, which one.  It's not
>>    readily apparent which of the variables to set because none of
>>    them are designed for the purpose.
> These are all fair points and I will chew on them, and would like
> to solicit the views of others on this as well.
> The LANG point is the weakest; I highlighted it in my mail only
> because it was shorter and easier to type--laziness again.  I am
> aware of the prescribed precedence of the POSIX locale-related
> environment variables.

I'd like to chip in my agreement with Ingo on this point, generally.
To me, -mfr feels less opaque, less surprising and less fragile than

LC_CTYPE=fr_FR.UTF-8 also seems, as Ingo says, to imply that groff
will treat input as UTF-8. That's what Heirloom troff does. On the
one hand, this lends some credibility to the idea of using the LC_
variables for this purpose. On the other hand, this groff ignoring
the UTF-8 part of LC_CTYPE all the more surprising.

If groff should continue to use LC_CTYPE to determine input language,
should it not also use it to determine input encoding?

