groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [groff] 03/09: tmac/an-old.tmac: Stop remapping ` and '.


From: G. Branden Robinson
Subject: Re: [groff] 03/09: tmac/an-old.tmac: Stop remapping ` and '.
Date: Thu, 29 Oct 2020 22:06:02 +1100
User-agent: NeoMutt/20180716

Hi, Ingo!

At 2020-10-28T15:30:35+0100, Ingo Schwarze wrote:
> Hi Branden,
> 
> G. Branden Robinson wrote on Wed, Oct 21, 2020 at 02:44:44AM -0400:
> 
> > commit 697e6db7fcacd403f5dde682002d02caa52e48df
> > Author: G. Branden Robinson <g.branden.robinson@gmail.com>
> > AuthorDate: Mon Oct 19 04:40:31 2020 +1100
> >
> >     tmac/an-old.tmac: Stop remapping ` and '.
> >
> >     Our own pages now appear to be clear of wrong-quote problems, so
> >     let's make them visible if they recur.  Those who don't want to
> >     fix bad man pages (distributors, site admins) can restore the
> >     mappings in their man.local files.
> 
> I think this commit ought to be reverted.  Sorry for not finding the
> time earlier to inspect its consequences and to discuss it with
> fellow documentation maintainers.
> 
> To start explaining why i consider this change bad, i should like
> to quote Anthony J. Bentley, with his permission:

A shrewd choice!  I've never worked closely enough with him to speak
intelligently about his pursuits, but just reading his name I feel a
sense of familiarity and credibility.  I get that with people I never
have the privilege to meet but who make good sense repeatedly over the
years.  So Mr. Bentley has my respect even if I can't concretely say
why.

NeoMutt is too dumb to recognize this '::' quoting convention so I won't
be reflowing his remarks beyond the first, however.

> :: As a rule I correctly escape ` as \` and ' as \(aq even in
> :: manpages.  They're actually needed less often than one might think.

I want to put a lot of emphasis on this point.  I think you glossed over
it, and I think it's important.  It is the first of two major
oversights that I perceive in your argument.

(Personally, I always spell \` as \(ga or \[ga], but that makes no
difference for this discussion.)

The escaped versions of these characters are actually needed less often
than one might think.  That is a quantitative observation of significant
qualitative impact.

> :: But most
> :: developers (even many manpage authors!) are unfamiliar with this
> :: historical oddity in troff. Expecting them to remember this rule
> :: and get it right is too much.

I don't agree.  The use of ` and ' as syntactical elements is unusual
even in man pages.  Particularly the grave accent: its primary
occurrence is as the old Bourne shell backtick operator, and Tom Duff
pointed out many years ago, and many others rediscovered, a syntax that
doesn't require quoting to grow exponentially with nesting depth is
superior.

People shouldn't rewrite ` as \` or \(ga, they should rewrite it as
$(...).

The apostrophe is admittedly more frequent.  But not ultra-common.

Our own pages show some of its heaviest uses, thanks to device escapes
and the _convention_ of using it as a delimiter in escapes of the form
which employ one.  Have a look at grops(1) and gropdf(1) when you have a
few minutes.  You'll find them using bare ', and getting the syntax of
the (device) escapes wrong anyway.  I have commits pending to fix them
both.  (I tend to use the git stash as a stack.)

I submit that \[aq] is no serious barrier to comprehension, and once one
devotes a certain level of focus to writing one's man page, there is
little difficulty.

> :: Searching the corpus of manuals to
> :: correct all missing escapes is also too much.

Yes, but no one has proposed a mission for doing so; there's no deadline
and no flag day.  As you've pointed out in the frequency studies I did a
few years ago, raw counts get thrown off by the high volume of man pages
that are actually composed in something else altogether, like DocBook or
POD.  Fix those tools, and many pages correct these defects as soon as
they are generated again.

> :: In the proposed fantasy world, everyone would be intimately familiar
> :: with the rules of elegant typography. Groff could make its change,
> :: and no manpages would suffer.

I propose no fantasy world, only incremental improvement.

> :: In the current world, ` and ' and such are always displayed literally.
> :: It means I give up my curly quotes in favor of a slight typographical
> :: downgrade, but important content (command/config examples, source
> :: code, etc) is never wrongly transformed.

No, but it's often wrongly written in the first place, even in areas
other than special character escapes.

> :: This is what we have in
> :: the www repository, and with the current mandoc rendering, basically
> :: what we have in base also.
> ::
> :: Groff making its change will take us to a third place, where because
> :: manpages are often poorly written and the distinction between escapes
> :: is subtle and poorly understood, manpages will often be poorly
> :: rendered and correcting the changes will be a lot of unpopular,
> :: frustrating busywork.

Anyone who is discomfited by any aspect of this is welcome to keep the
existing remappings that distributors will likely provide,
in their /etc/groff/man.local or equivalent.

Debian's had this for 20-something years, and AFAIK it is preserved by
all its derivative distros, including Ubunutu.

.  \" Debian: Strictly, "-" is a hyphen while "\-" is a minus sign, and the
.  \" former may not always be rendered in the form expected for things like
.  \" command-line options.  Uncomment this if you want to make sure that
.  \" manual pages you're writing are clear of this problem.
.  \" if '\*[.T]'utf8' \
.  \" char - \[hy]

It may thus perhaps be a mortifying realization on your part that I have
plans to fix all our hyphens, too, and remove _that_ part of our
an-old.tmac, too.

In practice, it should alarm you _less_ for a reason I'll get to below.

> :: As an example, I point you to this commit, where I fixed multiple
> :: pages whose authors decided to turn apostrophes into *acute accents*
> :: and never noticed. It's worse than bad kerning!
> :: https://marc.info/?l=openbsd-cvs&m=155945840114941&w=2

Maybe they were Germans?  I spent about a decade from 1995-2005 reading
list mails from Germans who typed things like "don´t".  Apparently it
had to do with the keyboard layout.

But anyway, that doesn't look like a Herculean effort to me.  He fixed
style problems in 5 man pages?  I call that a Tuesday.

These things tend to fall into patterns.  A bit of time in grep
exploring, and a bit with sed, can pay big dividends when the workload
is truly big.

> :: I'm sympathetic to groff here. I can see why they want to fix a
> :: rather glaring divergence between terminal and PDF output. But in
> :: my opinion the change will cause more rendering problems than it
> :: will fix. The minor typographic gain of pretty quotes in prose will
> :: be offset by the far worse typographic loss of unescaped literal
> :: ASCII ' and ` in command contexts, most of which will never be
> :: fixed.

It is here that I must raise the other major oversight.  What _put_ us
in this deplorable position?

Unicode.

Unicode has brought us many benefits, but it has costs as well.
Punycode had to be developed to protect us against domain name spoofing
through confusable characters.  UTF-8 has led to much misery in the
Python community.  It challenges the longstanding Unix assumption that
_any_ null-terminated string can be a valid file name (including slashes
to represent hierarchical levels).

One of those costs, I submit, is that we have to stop being sloppy
thinkers about these funny little ticks high above the text baseline.

The ASCII committee actively promoted ambiguity.  Later, Knuth and the
*roff progenitors used that ambiguity as foundation stones.  AT&T troff
_never_, before Plan 9 anyway, grew special character escapes for _any_
type of quotation mark.  [adoclr]q?  All groffisms.

What was Murray Hill's solution to the problem of confusable glyphs?

They trusted the user to figure it out.

If you saw $ echo ´foo$bar´ come out of your typesetter, you knew what
was meant.  I'm pretty sure I remember seeing published books with
Kernighan's name on them with this problem.  (Sure enough that I'll go
looking for examples if you challenge me on the point, but it'll take
time away from my work developing groff in ways to which you _don't_
object. :-P )

Have we any less confidence in our users today?

If we do, we always have the remedy of making the pages _actually
correct_.

> Let me add that the current rules for manual pages are not wrong,
> they are merely slightly *different* from the rules for general-
> purpose roff typography.

That is a distinction I am trying to efface.  I want knowledge to port
from man page writing to composition of other roff documents.  I have
gone out of my way over the past three years to document not only these
two, but the other five ASCII characters that render surprisingly in
*roff, man pages included.  You have contributed to this work yourself!

> For general typography, we have:
> 
>   input  output
>   `      U+2018 left single quotation mark
>   '      U+2019 right single quotation mark
>   \(ga   U+0060 grave accent
>   \(aq   U+0027 apostrophe-quote

I don't know why you omit \(oq and \(cq here; they are just as valid as
input in this context.  If someone uses them, they will get correct
output, no?

> In contrast, the current rules for manual pages are:
> 
>   input  output
>   `      U+0060 backtick for shell command substitution and the like

Hmm, yes.  "...and the like".  Without looking it up, what other
widespread programming languages can you name (not documentation systems
constructed by people opposed to typesetting) use it for any other
purpose?  (Perl doesn't.)

>   '      U+0027 programming-language single quote

>   \(oq   U+2018 left single quotation mark for running text
>   \(cq   U+2019 right single quotation mark for running text

Not just for running text.  Any place you need one.  Granted, they're
relatively rare, especially in places using the U.S. English quotation
style.  Have a look at groff_char(7), which I rewrote not too long ago,
with much feedback from Dave Kemper.

> Even if we were to design all this from scratch - which we cannot
> do since many thousands of manual pages already exist -, that would
> be a better design for manual pages, for three reasons:
> 
>  1. Literal ' and ` are frequently needed in manual pages,

You're putting your thumb on the scale.

> :: As a rule I correctly escape ` as \` and ' as \(aq even in
> :: manpages.  They're actually needed less often than one might think.

>     programmers will constantly forget escaping them,

In which case they'll get output no more confusing than that which Brian
Kernighan and others at 1127 expected them to resolve with so little
difficulty as to leave it without comment.

>     and when
>     they are escaped, they make the manual page source code
>     very ugly, hard to write, hard to read, and error-prone.

_That_ horse left the barn with the escape syntax in the early '70s.
Especially that damnable unmatched open parenthesis.  I think it was in
CSTR #97 that Kernighan characterized roff styntax as "rebarbative".  An
excellent word, too seldom used.

In any case I disagree with your assessment.  I edit man pages all the
time and I tout myself as having no particular gifts in this area.
Technical writing, like competent programming, demands patience,
attention to detail, and a capacity to shift perspectives.

>  3. Manual page authors rarely need to type \(oq and \(cq by hand
>     in the first place but will normally use macros like .Sq
>     or .Ql which do that automatically, so there is no burden
>     on manual page authors.

...right.  I think this undermines your argument.  In the man page macro
language you think everyone should be using anyway, people don't even
have to learn these character escapes.  Single-quotation is so common
that you have a dedicated macro for it, and backtick quotation so rare
that you don't.  Although there _is_ a \*(ga string for it.  And
presumably since it's there, people should be using it _instead of the
backtick literal_.

> The strongest reason why this should be reverted is that it is an
> incompatible change both ways with little to no hope of ever fixing
> the consequences.  Correctly written old manual pages using ` and '
> literally e.g. in code samples will suddenly render incorrectly
> with new formatters, in an important way.  Correctly written new
> manual pages that no longer escape \(oq and \(cq and instead rely
> on the general typesetting rules will look worse with old formatters.
> As explained above with the points 1-3, these downsides will not
> even be offset by any advantages, but will in addition make life
> harder for manual pages authors for no benefit.

No, there are multiple ways out of this.  If people are that enamored of
the status quo 13 years ante, they have multiple recourses.

1. Fix the page.
2. Use man.local.
3. Use -Tascii or -Tlatin1 and bask in the clarity.

If Unicode's glyph repertoire is such a temptation to sin, don't use a
UTF-8 terminal!  Or tell man(1) to dumb itself down with -Tascii.  If
thy right hand offends thee, cut it off.

> Finally, the change as committed is incomplete.  It changes the
> rules for the man(7) macros but leaves the identical rules for the
> mdoc(7) macros unchanged.

This is a legitimate complaint.  Because of the above mechanisms the
package provides for accessing the appropriate glyphs, I thought that no
changes were necessary.  I'll have another look at this.

[looks]

Yes, I see that in 98acc924f4e32cfc2209df5db0c21921df8cc7ac, parallel
changes were made to an-old.tmac and doc.tmac.

2009!  Later than I would have thought.

> I think trying to explain to manual page maintainers that quoting
> rules in manual pages depend on which macro set is in use would be an
> extremely bad idea, it would be utterly confusing.  That alone - the
> change being inconsistent and incomplete - would already seem
> sufficient to me for requesting a revert, even if it wouldn't be a bad
> change for all the reasons explained above.

No, that is a valid point, though presumably you have some hortatory
words for people not using \*(ga and .Sq like they're supposed to.

Why _is_ there no macro for backtick-quotation in mdoc?

> Your argument that distributors and packages of groff can individually
> decide to revert your change if they want to is an even worse idea.
> That would result in ecosystem fragmentation.  You want different
> rules for manual page markup depending on the operating system?
> You want that the manual pages from one operating system render
> incorrectly on another system, and vice versa?  What about the
> manual pages of portable software - those poor people would be
> *forced* to escape everything: \(aq, \(ga, \(oq, and \(cq - not
> as a choice because they want to, but to get correct rendering
> on all systems.  Inviting distributors and packagers to mess with
> these rules clearly exacerbates the problems rather than helping to
> solve them.
> 
> If groff insists on this change, then mandoc and all systems using
> it (OpenBSD, FreeBSD, NetBSD, Illumos, Alpine Linux, Void Linux)
> will probably be forced to follow your advice to not follow groff
> and to patch your change out of groff.  I don't see what the
> alternative would be...  And i'm not sure what Fedora could do,
> which officially supports both man-db/groff (by default) and mandoc
> (which can be activated with its "alternatives" configuration system).

This doesn't frighten me.  I was an active Debian developer for many
years.  A _highly_ active one, by some standards.  Distributors patch
things.  That is neither to be celebrated nor lamented.  It is a fact
of software dissemination.  My aim is for groff's man pages to set a
high example--we're still quite a ways from it, by my reckoning--knowing
that some other man page writers will lack either the motivation or the
interest to be as scrupulous as we.  But that's okay.

As I noted above, distributors have been "patching out" stuff like this
for decades.

The remapping of these glyphs in groff was well-intentioned but I think
it has persisted for much longer than it should have.

It was a crutch to get us over a UTF-8 transition hump.  We're there
now.  Or at least we will we can feed UTF-8 input straight to troff...
X'-D

People writing man pages cannot simultaneously pretend that Unicode both
does and does not exist.  We are not using Model 37 Teletypes and
C/A/Ts.  You need something wider than an 8-bit char to encode
characters these days.

The future is here and it's time to learn the difference between
backticks and open quotes, neutral apostrophes and close quotes.  As
authors of software and its documentation, it is _our_ responsibility to
set examples of learning and adaptation.

With this patch, an even slightly attentive eye will _notice_ the
difference between "don’t" and "don't".  Imagine the thousands of people
who will then take the lesson, perhaps their first, that this
distinction even exists.  Consider then the benefit of deciding to write
your first man page _already knowing of it_.

Recreating the limitations of 1970s Teletype typography as a
prescription for a UTF-8 graphical terminal emulation experience is
absurd to me.

工欲善其事,必先利其器

Regards,
Branden

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]