[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

man pages, defensive programming, and bibliographic formats

From: G. Branden Robinson
Subject: man pages, defensive programming, and bibliographic formats
Date: Sun, 26 Jan 2020 00:22:08 +1100
User-agent: NeoMutt/20180716

(was: 01/01: tbl(1): Note origin of tbl.)

Hi folks,

Ingo and I had an inadvertent off-list exchange.  I thought I'd loop in
the other developers and interested parties.

At 2020-01-24T23:59:13+0100, Ingo Schwarze wrote:
> G. Branden Robinson wrote on Sat, Jan 25, 2020 at 02:40:40AM +1100:
> > Did you mean to send this privately?
> No, omitting <address@hidden> was an oversight.
> But better this way than the other way round and publicly post
> a private letter...
> > At 2020-01-23T22:54:19+0100, Ingo Schwarze wrote:
> >> Branden Robinson wrote:
> >>> +.SH "See Also"
> >> Quoting of .SH arguments is not needed.
> > I know.  It's a delberate style choice of mine to make man(7) easier
> > to learn.  When people give arguments to macro calls, they need to
> > be aware of whitespace.  By getting into the habit of quoting or
> > escaping macro arguments containing whitespace, their knowledge will
> > migrate more easily between, say, .SH and .BR.
> I don't really buy that.  Look at how macros behave with respect to
> white space in arguments.
> Macros that don't take arguments in the first place:
>   .EX .EE .PP .TQ .YS .DT
> Macros that take only a single argument which cannot contain
> whitespace anyway:
>   .MT .ME .RS .RE .SY .UR .UE .PD .UC
> More macros where the first argument cannot reasonably contain
> whitespace:
>   .OP .TH .TP .HP .AT
> Macros where multiple arguments are treated symmetrically:
>   .B .I .SM .SB .SH .SS
> Macros where multiple arguments alternate meaning:
>   .BI (& five friends), .IP
> So, for almost all macros, you just don't need to worry about argument
> quoting at all.  The macros .BI (& friends) and .IP are really the
> only two odd ones out, and people need to understand *why* these two
> are so unusual and why quoting is required when arguments contain
> whitespace for these two ususual macros.
> It makes nothing simpler to make people worry about whitespace on
> macro lines in general when for the vast majority of macros, it just
> doesn't matter at all.

My rebuttal to this is that while this is a good analysis of the
relative frequency of macro names which require careful whitespace
handling (i.e., quoting or escaping of whitespace in arguments) with
respect to the man(7) _namespace_, it is poorly representative of the
frequency with which the various man(7) macros are actually used.

Along with for the paragraphing macros, the font-styling macros are
among the very most frequently used.

I threw together a script to count up man(7) macro use in our corpus:

$ MANS=$(find !(build|EXPERIMENTS) -name "*.man"|sort)
$ man-macro-frequency-counter $MANS | sort -rn -k2
B: 7236
I: 5179
TP: 2512
BR: 2316
IR: 1431
P: 983
RE: 841
BI: 792
IP: 691
SH: 484
RS: 475
LP: 340
OP: 284
TQ: 261
EX: 231
EE: 231
RI: 221
SS: 209
SY: 181
PP: 174
RB: 130
YS: 127
IB: 98
UR: 65
UE: 65
TH: 61
MT: 52
ME: 52
SM: 0
SB: 0

The script is attached.

Note that I have not finished my project of cleansing the pages of
unnecessary font escapes, so some of the font-style macros are

> > It's for similar reasons that I do this:
> > 
> >     The mf macros \&.foo and \&.bar should not be called within a
> >     \&.pp context.
> > 
> > The zero-width space escapes on the first line are not necessary;
> > but it's a good habit to use them anyway, because what happens if
> > you recast and reflow the sentence in your text editor such that one
> > of those ends up starting a line?
> When semi-automatically transforming code, you need to check the
> result in any case.

I agree.  Code defensively and then validate the output.  :)

> > In my view, this sort of thing is not cargo-culting, but defensive
> > programming.
> In my view, unnecessary escaping just makes text harder to understand
> by making it look mysterious.  How many people will think the
> superfluous escaping is actually somehow required?  It's likely to
> cause fear, uncertainty, and doubt.
> How is the (arbitrary) rule "a dot needs escaping at the beginning
> and end of each word" easier to learn than the (accurate) rule "it
> needs escaping at the beginning and end of each input line"?  They
> seem both the same difficulty to me, except that the accurate rule
> needs to be invoked far more rarely (less work and obfuscation)
> and also becomes obvious once you understand how request/macro and
> sentence end detection works, whereas the arbitrary rube is, well,
> arbitrary and needs yet another argument for understanding it even
> after understanding the root cause of the problem.

You and I disagree on this and I'd like to solicit the views of the
folks on the mailing list.

> > (1) The stylistic format of such bibilographic entries; and
> Sure.  I would probably settle on a more standard form like
>   Michael E. Lesk and Lorinda L. Cherry,
>   Tbl -- A Program to Format Tables,
>   AT&T Bell Laboratories, Murray Hill, 1989,
> or something like that, in any case starting with the authors, then
> the title, then the rest.
> > (2) now that I understand the basics of refer(1), suck all the
> > citations into an index file, ship it, and have our pages use it
> > where necessary.
> That sounds like overkill.  Somethin like refer(1) becomes useful when
> you write many dozens of journal articles citing thousands of other
> articles.  For a about ten to twenty documents citing less than a
> handful of sources each, setting up the machinery is more hassle and
> less flexible than doing it by hand.  Also, it makes maintenance
> harder for people not used to refer(1).

I'm not wedded to point (2).

> I'm quite sure we don't want the installed manual pages to .so
> anything.

Agreed on that point.  refer(1) doesn't make that necessary, though, as
I understand it.

> Would it be an improvement to automatically generate the final version
> of the manual pages in some way using a database?  I doubt it.

Again, shouldn't be necessary.

> >> [ snipped some reasons why you want to annotate the citation ]
[I've added back in most of what that stuff was --GBR]

I have further commentary on the exchange below; I just want to present
it to the list to solicit views on source citation in our man pages.

>> > Commenting on cited books or articles in the SEE ALSO section is
>> > very rare, and it will almost never happen for more than one
>> > article in the same manual page.  So there is really no need to set
>> > the reference itself as a list tag and the comment as a list body.
>> > It doesn't even look particularly good.
>> > 
>> > So at the very least, we could just remove the .TP and the \cs.
>> You're right that there's not much precedent here.  The good news for
>> you is that I'm not settled on this format.  I'd like to get cites to
>> all the important Bell Labs white papers into our man page corpus,
>> and then standardize two things:
>> (1) The stylistic format of such bibilographic entries; and
>> (2) now that I understand the basics of refer(1), suck all the
>> citations into an index file, ship it, and have our pages use it
>> where necessary.
>> > But i think going a step further is even better because nothing
>> > in the comment really matters:
>> > 
>> >  - gratis version: no need to say that, it's obvious when you can
>> >    download it for free, and if you would have to pay for it,
>> >    we would hardly include a URI, at least not without warning
>> >    that it needs payment
>> groff is a GNU project, and generally eschews non-free documentation,
>> except for historical/academic works, for which even RMS has some
>> tolerance.  (The FSF won't distribute them, but it doesn't try to
>> pretend they don't exist, unlike some proprietary operating systems.)
>> I think it's worth pointing this out so contributors know where a
>> manual might need a freely-licensed alternative to be written.
>> >  - from UNIX v10: no need to say that, it's said in the document
>> >    itself and doesn't matter for the manual page at hand nor for
>> >    the decision of a reader to look at it
>> So many people seem to think that Research Unix stopped with V7 (or
>> maybe 32V) that I found it noteworthy.
>> >  - early implementation: misleading, because the implementation
>> >    described in the cited document is almost exactly a decade
>> >    younger than the original implementation
>> ...and three times as many years have passed since those two
>> instants.  :)
>> >  - Uriel Pereira: totally irrelevant, that website merely saved
>> >    a copy of the document
>> Something about my bibliographic instincts tells me I need to
>> characterize or describe the site being linked to somehow.
> I think i understand why you feel that some of those details may be
> interesting from a historical perspective, i like considering history
> myself.  But here, we are not even talking about a HISTORY section.
> People look at SEE ALSO because they wonder how to use tbl(1), not
> because they wonder whether v10 was a great UNIX or what rms@' opinion
> on non-free documentation is or whether Uriel Pereira created some
> website before he died.  I'd simply prefer to stay on topic as much as
> possible, in particular outside HISTORY.
> > Bottom line: please regard the exact layout and content of the
> > bibilographic information I'm adding as "in flux".
> > 
> > To hammer out those question of content and style, let's loop the
> > list in on my points (1) and (2) above.
> Sure.  Maybe you want to suggest something in a well-organized manner
> rather than me picking apart a mail?
> There is no hurry to re-fix tbl(1).  That can easily be done once
> you have settled on a nice format.
> But note that i have very rarely, if ever, seen lists of references
> with annotations attached to each cited article.  Even less so in
> manual pages: in a research journal, having two pages of citations
> at the end of an article may be useful or even required, but in a
> manual page, references ought to be kept concise.


Attachment: man-macro-frequency-counter
Description: Text document

Attachment: signature.asc
Description: PGP signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]