[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [groff] 03/09: tmac/an-old.tmac: Stop remapping ` and '.

From: G. Branden Robinson
Subject: Re: [groff] 03/09: tmac/an-old.tmac: Stop remapping ` and '.
Date: Sat, 31 Oct 2020 14:18:25 +1100
User-agent: NeoMutt/20180716

Hi Anthony!  Good to hear from you.

At 2020-10-30T16:44:22-0600, Anthony J. Bentley wrote:
> Hi Branden,
> On Thu, Oct 29, 2020 at 5:06 AM G. Branden Robinson
> <> wrote:
> > The escaped versions of these characters are actually needed less
> > often than one might think.  That is a quantitative observation of
> > significant qualitative impact.
> I agree. If it were merely a matter of paying attention while
> authoring new manuals, it wouldn't be a serious problem. But...
> > The apostrophe is admittedly more frequent.  But not ultra-common.
> Literal ASCII apostrophe is incredibly common in existing manuals,
> whether shell examples or configuration files or source code snippets.
> As is ~. If ` and ' get transformed to quotes in terminal output,
> transforming ^ and ~ to U+02C6 and U+02DC can't be far behind.

Good idea! 3;-)

> Changing the rendering without fixing manuals is not free. There is a
> cost in user frustration.

That's why I think distributors should absorb the remappings into their
man.local files, which reside in /etc (or an equivalent place) and
announce their availability for configuration to the user, instead of
living in /usr/<whatever>/groff which usually means "if you touch this,
stuff might break".

> You argue that manpage authors should be aware of troff's
> idiosyncracies, but surely you don't think *readers* have the same
> obligation. (Well, reading further into the mail, perhaps you do!)

I think it's possible for naïve readers to notice the difference between
' ` ´ ‘ and ’ with anything but a minimal level of attention.

These glyphs _are_ confusable which is why I propose to ensure that we
get the right ones into our man pages _at the source_.

And to do that, we need mechanisms of detecting when they're wrong.
That's why I made this commit; I needed it.  And some of those who share
my concerns will, too.  It _far_ easier for me to run my little script
that shows me diffs in rendered pages between commit A and commit B than
to catch such problems by diffing roff source.

> I fear a loss of mindshare. These days documentation is often an
> afterthought. Quality man pages even more so. I've spent considerable
> time and effort convincing authors to consider a typesetter they
> consider to be archaic. I think it likely that triggering unexpected,
> frustrating rendering changes like this will drive software developers
> even further in the direction of HTML and Markdown.

I sympathize, and I don't want that outcome either.  What to do?  I have
some ideas below.

> > As you've pointed out in the frequency studies I did a few years
> > ago, raw counts get thrown off by the high volume of man pages that
> > are actually composed in something else altogether, like DocBook or
> > POD.  Fix those tools, and many pages correct these defects as soon
> > as they are generated again.
> In OpenBSD alone, uses of ', `, ~ and ^ that will need escaping number
> easily in the thousands—and I'm only including uses within
> human-authored -mdoc pages in that number.

I would not curse anyone with performing such changes one by one in a
text editor even with global replacement operations.  I imagine, based
on my experiences with groff's mere 60 pages, that it can be done in
fits with sed scripts that recognize certain tropes.  You need not rip
out the remappings until the work is done, or so close to done that the
remaining stragglers in perverse cases are thought to be so obscure that
they won't contribute measurably to the frustrated-reader problem.

> I don't share your optimism that roff-generating tools will be fixed.
> DocBook's generated manpages have been truly awful for many years;
> when has that ever improved?

I noted on this list some years ago that docbook-to-man seems to poison
everyone who touches it.  By that reasoning, all someone has to do to
get rid of me and my crazy schemes is encourage me to fix it.  ;-)

> Does POD even provide a capability to semantically separate prose from
> code literals and escape characters accordingly?

I know Russ Allbery from Debian and he's a reasonable guy.  I do so
little Perl programming that I'm not even sure what pod2man[1] actually
gets _wrong_ in this department.  But I'm confident Russ can be
approached with a well-motivated change request if the problem is
articulated clearly.

> Similarly, I don't share your optimism that manuals themselves will be
> fixed, slowly or quickly. Especially since you suggest distributions
> turn this off in man.local or render manuals in ASCII!

It is a little difficult to argue against a position which holds
simultaneously that my commit was both far too disruptive and will have
negligible impact.

Granted, I haven't modified the groff's sample man.local (it has nothing
in it but a comment header) to shift the remappings over there, but I'm
happy to do so if people think the _major_ redistributors of groff are
so inattentive that they wouldn't do so themselves.  I was hoping to
stimulate consideration, on the part of groff packgers, as to whether
they'd like to help move this ball forward with their respective

I also acknowledge that this change is worthy of an item in the NEWS

> > It may thus perhaps be a mortifying realization on your part that I
> > have plans to fix all our hyphens, too, and remove _that_ part of
> > our an-old.tmac, too.
> I'm not familiar with this. Are proposing translating - to U+2010 as
> well?

Yes.  But only after I fix all groff's own pages to do the right thing.

That might take some time.  I haven't measured the problem yet.

> Will you argue that literal ASCII hyphens are "not ultra-common"
> in manpages too? Be serious.

Oh, no, they _are_ ultra-common.  There's one right there.  The good
news is that man page writers have much higher awareness of the
hyphen-minus problem already, and correct practice is already
widespread.  So hyphens that should be pastable dashes are already '\-'
in many cases.

> They trusted the user, in an environment where a higher proportion of
> readers were familiar with the formatting language in question, where
> there were few alternative means of producing documentation, and
> perhaps most importantly, where copy & paste did not exist.

The copy and paste point is the best of these.  I want to get the
world's man pages--or, at the very least, groff's--to the point where
they copy and paste code specimens correctly _without_ the crutch of
this remapping.

> This change visibly and obviously affects tens of thousands of troff
> documents in the output format in which they are most often read.
> Whatever groff does in the end, I just feel like something with such
> an impact deserves some discussion first.

Certainly.  No release has been made and several courses of action are

1. Advise distributors and direct consumers of groff releases to apply
   the remappings in their site man.local (and mdoc.local[2]) files)
   if they don't want to see the buggy man pages and (presumably)
   participate in an effort to get them fixed.
2. Restore the remappings, but in our tmac/man.local.  Distributors and
   direct consumers will have to perform a merge with their existing
3. Restore the remappings to man.local, but make them conditional on a
   register that defaults off.
4. Restore the remappings to man.local, but make them conditional on a
   register that defaults on.
5. Revert the change[3] entirely.
6. Revert the change an un-fix the misuses of ` and ' in code specimens
   that I've been repairing for the past few years.

I posit (6) not because I think anyone is willing to admit to holding
the position, but to establish an endpoint for the conservative
continuum.  By symmetry I suppose there is a Molotov-hurling radical
position (0), which is to make a parallel change in tmac/doc.tmac-u and
say nothing about it in any form of release notes.  This is not my
position but I think Ingo feared that it was.  He's accustomed to being
alarmed by me.  :D

I'd be happy with any of (1) through (4), with a mild grumbling
crankiness increasing with the integers.  My biggest problem with (3)
and (4) is thinking of a good register name (this is groff, so we need
not limit ourselves to two characters).  I think any of the first
four avenues merits some sort of mention in NEWS.

What do you think?


[2] But I haven't removed the remappings from mdoc yet, as Ingo noted.
[3] 697e6db7fcacd403f5dde682002d02caa52e48df

Attachment: signature.asc
Description: PGP signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]