[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [groff] 03/09: tmac/an-old.tmac: Stop remapping ` and '.

From: Ingo Schwarze
Subject: Re: [groff] 03/09: tmac/an-old.tmac: Stop remapping ` and '.
Date: Sun, 1 Nov 2020 03:39:10 +0100
User-agent: Mutt/1.12.2 (2019-09-21)

Hi Branden,

G. Branden Robinson wrote on Sun, Nov 01, 2020 at 12:27:36AM +1100:
> At 2020-10-31T13:55:08+0100, Ingo Schwarze wrote:

>> Please take man.local out of the equation.  Such a thing simply
>> doesn't exist on FreeBSD, OpenBSD, NetBSD, or Dragonfly, and it
>> won't be created on OpenBSD.

> groff ships it.  What do you do with it?

Install it where the default groff install puts it, which happens to
be /usr/local/share/groff/site-tmac/, on a filesystem where even the
system administrator is not supposed to edit files.  A file that is
empty and not supposed to be edited is equivalent to not existing
for all practical purposes.

>>> 3. Restore the remappings to tmac/an-old.tmac, but make them
>>>    conditional on a register that defaults off.
>>> 4. Restore the remappings to tmac/an-old.tmac, but make them
>>>    conditional on a register that defaults on.

>> OS-local changes solve little in the first place.

> What's OS-local about
> .if r SOMEREG \
> .   if \n[SOMEREG] \{\
> .     char ... ...
> .     char ... ...
> .   \}
> ?

Surely you will not expect users to set or unset SOMEREG,
and not even system administrators.  Apart form not being to be
expected, that would also make no sense, because which value of
SOMEREG is needed depends on which manual page if being rendered,
not on the preferences of users or system administrators.

So SOMEREG will be whatever the distribution or packaging
makes it.  Which means that at the end of the day, options 3 and 4
are exactly the same as options 1 and 2, there is no practical
difference whatsoever.

Once that situation exists, some authors will write manual pages
for one value of SOMEREG, some for another, and we end up with
large numbers of manual pages being non-portable.

>> Manual pages are supposed to be as operating-system independent as
>> possible, such that pages from one system can also be read on another,
>> and such that authors of portable software know what to do.  You seem
>> to be advocating ecosystem fragmentation, making manual pages
>> non-portable, which astonishes me.

> That argument is precisely equally true of supporting a configurable LL
> register.  The world has shattered into N pieces, where N is every
> terminal width in use.  Do you yearn for the days of 110 baud Teletypes
> with nroff grinding out the reference pages at 65n width?

Most real-world manual pages look best at the default of LL=78n,
but rendering doesn't usually become incorrect for values like
LL=60n or LL=132n.  So no, it's not the same.  LL is merely
unimportant.  By contrast, using a value of SOMEREG that mismatches
the page being formatted causes misrendering.

>> I have no preference among options 1 to 4.
>> They are all equally bad.

> This is an absurd claim.  I don't think you have understood what
> options 3 and 4 entail.

Not sure what you mean; what is wrong with my above reasoning?

Isn't the above what will happen as a consequence of options 3 or 4?

>> By the way, the problem is not only changing thousands of existing
>> manual pages - which can't be automated; every single instance of '
>> and ` would have to be checked manually.  Here is a list of *a few
>> examples* of affected manual pages from OpenBSD base alone.  The
>> list is definitely far from comprehensive, these are just some
>> examples:
>>   section 1: bc, csplit, expr, find, flex, getopt, grep, ksh, ldap,
>>     less, mandoc, more, paste, pax, shar, ssh, su, tar, tmux, vi,
>>     xargs
>>   section 3: BIO_f_ssl, EVP_PKEY_keygen, RMD160Init, SHA256Init,
>>     SSL_CTX_set_alpn_select_cb, SSL_CTX_set_default_passwd_cb,
>>     cgetent, fgetln, fgets, getopt, getopt_long, malloc, stpcpy,
>>     strchr, strcspn, strncat, strncpy, strrchr, strsep, strtol,
>>     strtoul, va_start, wcslcpy, wcsrchr, wprintf
>>   section 5: ifstated.conf, nsd.conf, pf.conf, pf.os, relayd.conf
>>   section 7: ascii, ports, roff

> You haven't shared your search method,

Oh, that's simple: i grepped for ' and ` and quickly looked for
candidates that would obviously come out wrong, without wasting
much time on it.

> so I can't infer very much from
> this.  Backticks and apostrophes used as quote characters tend to abut
> whitespace; backticks are unknown in prose,

Indeed, changing backticks to \(ga could probably be achieved by
grepping for them and checking all instances.  Still, it would be
a bit tedious, they occur in about 2850 base system manual pages
installed on OpenBSD.

> and apostrophes are invariably within word boundaries in English
> except in dialect registers that are unheard of in man pages.
>       Don' go messin' 'roun' wit' the -r and -f flag on rm if'n ya
>       knows what's good fer ya.
> This sort of thing is not seen.
> I don't doubt that there are a lot of wrongly-encoded "quotes" in
> OpenBSD man pages, however.

Apostrophes are harder to filter than backticks, they appear in
large numbers of different ways.  Even in code samples, they tend
to abut whitespace.  I see them in about 3250 manual page files,
on about 62,000 lines, i.e. 20 lines on average per manual page,
in OpenBSD base.  Which of these need to be changed?  Some filtering
is likely possible, but many will need to be checked by hand.  How
long did it take you to fix groff's 60 manual pages?  Care to
multiply that by 50?  And then by 4 or 5 for the number of BSD
operating systems?  And then you haven't even started with portable
software, or GNU/Linux.  And developers will come along and commit
new manual page content all the time.

>> The problem is that manual pages are written by software developers,
>> not by typesetters, who are used to typing programming languages
>> and who are used to the fact, from the past, that these five
>> characters do not need escaping.

> They did in 2008, and every year before that, for people using -Tutf8.

I know that UTF-8 is very significantly older as an invention but
the reason the fix was only committed around 2008 is likely that
around that time, people may have started caring more about UTF-8
output from manual pages.  The change made an output mode of
increasing popularity better for the input that was usual in manual
pages.  It doesn't mean that programmers were used to escaping these
characters before that, as you see from the fact how many pages
still don't escape.  Quite to the contrary, if escaping had been
common before that, the change would not have made much sense.
I still consider it very likely that most programmers are used to not
escaping these characters, today just like between 1990 and 2008,
no matter whether they wrote their first manual page before or after
2008, certainly with the exception of some who had previous exposure
of typography or ventured to study documentation with unusual


reply via email to

[Prev in Thread] Current Thread [Next in Thread]