groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Groff] ASCII Minus Sign in man Pages.


From: G. Branden Robinson
Subject: Re: [Groff] ASCII Minus Sign in man Pages.
Date: Wed, 26 Apr 2017 11:54:21 -0400
User-agent: NeoMutt/20170113 (1.7.2)

At 2017-04-26T15:50:26+0100, Ralph Corderoy wrote:
> Writing a man page is writing troff using the -man macros.  Always was,
> always will be.  It's not some non-troff mark-up language that happens
> to use troff as a back-end.  One must be prepared to understand it's not
> just plain text and commands on lines starting with `.':

You may be right, but then it's long past time we talked to the
man-pages folks about about man(7), and revisited the discussion in
groff_man(7).

Here's groff_man(7):

PORTABILITY AND TROFF REQUESTS
       Since the man macros consist of groups of groff requests, one  can,  in
       principle, supplement the functionality of the man macros with individ‐
       ual groff requests where necessary.  See the groff  info  pages  for  a
       complete reference of all requests.

       Note,  however,  that  using  raw troff requests is likely to make your
       page render poorly on the (increasingly common) class of  viewers  that
       render  it  to  HTML.   Troff  requests make implicit assumptions about
       things like character and page sizes that may break in an HTML environ‐
       ment; also, many of these viewers don't interpret the full troff vocab‐
       ulary, a problem that can lead to portions of your text being  silently
       dropped.

       For  portability  to  modern  viewers,  it  is  best to write your page
       entirely in the requests described on this page.  Further, it  is  best
       to  completely  avoid  those  we have described as ‘presentation‐level’
       (.HP, .PD, and .DT).

       The macros we  have  described  as  extensions  (.EX/.EE,  .SY/.OP/.YS,
       .UR/.UE,  and .MT/.ME) should be used with caution, as they may not yet
       be built in to some viewer that is important to your audience.   If  in
       doubt, copy the implementation onto your page.

Note that these macros that this page proclaims to be "portable" include
_all_ of the ones from an-ext.tmac, which you passed over quickly in
your comments below.

And here's man(7) from Linux man-pages:

   Safe subset
       Although technically man is a troff macro package, in reality  a  large
       number  of  other tools process man page files that don't implement all
       of troff's abilities.  Thus, it's best to avoid some  of  troff's  more
       exotic  abilities  where  possible  to permit these other tools to work
       correctly.  Avoid using the various troff preprocessors (if  you  must,
       go  ahead and use tbl(1), but try to use the IP and TP commands instead
       for two‐column tables).  Avoid using  computations;  most  other  tools
       can't  process them.  Use simple commands that are easy to translate to
       other formats.  The following troff macros  are  believed  to  be  safe
       (though  in many cases they will be ignored by translators): \", ., ad,
       bp, br, ce, de, ds, el, ie, if, fi, ft, hy, ig, in, na, ne, nf, nh, ps,
       so, sp, ti, tr.

       You may also use many troff escape sequences (those sequences beginning
       with \).  When you need to include the backslash  character  as  normal
       text, use \e.  Other sequences you may use, where x or xx are any char‐
       acters and N is any digit, include: \', \`, \‐, \., \", \%, \*x, \*(xx,
       \(xx,  \$N,  \nx,  \n(xx,  \fx,  and  \f(xx.   Avoid  using  the escape
       sequences for drawing graphics.

       Do not use the optional parameter for bp (break page).  Use only  posi‐
       tive  values  for  sp (vertical space).  Don't define a macro (de) with
       the same name as a macro in this or the mdoc macro package with a  dif‐
       ferent  meaning;  it's  likely that such redefinitions will be ignored.
       Every positive indent (in) should be paired with  a  matching  negative
       indent  (although  you  should  be using the RS and RE macros instead).
       The condition test (if,ie) should only have 't' or 'n'  as  the  condi‐
       tion.  Only translations (tr) that can be ignored should be used.  Font
       changes (ft and the \f escape sequence) should only have the values  1,
       2,  3,  4,  R,  I, B, P, or CW (the ft command may also have no parame‐
       ters).

       If you use capabilities beyond these, check the  results  carefully  on
       several tools.  Once you've confirmed that the additional capability is
       safe, let the maintainer of this document know about the  safe  command
       or sequence that should be added to this list.

I'm sorry for the lengthy quotes, but we're simply not telling man macro
package users what you think we are, and we haven't been for many years.

It's well past time we decided if any of the non-*roff man page
"parsers" have shown any merit.  To most people, HTML is not as sexy now
as it was in 1997--thank God.

>   • That the plain text is significant too, e.g.\& putting that `\&'
>     after the full stop so the inter-word gap doesn't become
>     end-of-sentence if word-wrap should split the line.
> 
>   • .IR being is just a shorthand for `\f' and `\^' that can be done
>     manually when needed, e.g. `.TH'.

FYI, .IR does not give you this "italic correction" in groff 1.22.3:

        .de1 IR
        .  if \\n[.$] \{\
        .    ds an-result \&\f[I]\\$1\f[R]\"
        .    shift
        .    while (\\n[.$] >= 2) \{\
        .      as an-result \/\\$1\f[I]\,\\$2\f[R]\"
        .      shift 2
        .    \}
        .    if \\n[.$] .as an-result \/\\$1\"
        \\*[an-result]
        .    ft R
        .  \}
        ..

>   • And, yes, what `\c' does, and to use it when required;  the common
>     case can easily be copied from other man pages.

Yes.  I'm pretty enthused by what I've found.  \c does indeed appear to
be a long-used, time-tested way of avoiding \f font escapes in favor of
man font-changing macros.  I do so love killing off '\f's.

> Fortunately, programmers are the kind of folk that will appreciate the
> finesse that can be expressed if they understand the cause, and that
> includes attractive PDF output.  I think good PDF representation is a
> handy test of whether the man page has flaws, and isn't second-class to
> UTF-8 on a TTY.

I entirely agree, and when I was recently modernizing about 112kB of man
page sources, utf8 in an xterm with a good font and PDF were the two
output formats I ensured looked good.  I spot-checked HTML output for
any egregious ugliness, but regarded it as second-class to the other
two, since as I understand it, grohtml is a work in (arrested?)
progress.

> Note, I'm not talking about writing an mdoc page.  Like others, I find
> the macro names too many, too confusing, the result too noisy with its
> nested macros, and have never persevered.  That may be my fault and I
> should try again.  :-)

I think that it, too, is salvageable with documentation written by
someone who _doesn't_ know its internals inside out, and perhaps with a
nice fat front-end of nothing but .als requests to supply macro names
comprehensible by English-speaking humans.  ;-)

> > INSIDE manual pages, - for \(hy or \- for \(mi is a terrible idea
> > already now because the three main implementations (including groff)
> > don't do that in the quite important -Tutf8 device.
> 
> This is because of the bodge to map `-' onto ASCII 45, by Debian
> originally, was it?  Rather than stand firm and map just `\-' and tell
> complainants that the upstream man page needed fixing.

I was around at the time that decision was taken and IIRC was one of the
people talking to Colin Watson about it.  It was done because there were
so _damned many_ pages frustrating our users with cut-and-paste
problems.  We stuck that bodge in /etc/groff/man.local and documented
exactly what and why it was done so that experts could reverse it.

$ cat -n /etc/groff/man.local | sed -n '17,24p'
    17  .  \" Debian: Strictly, "-" is a hyphen while "\-" is a minus sign, and 
the
    18  .  \" former may not always be rendered in the form expected for things 
like
    19  .  \" command-line options.  Uncomment this if you want to make sure 
that
    20  .  \" manual pages you're writing are clear of this problem.
    21  .  \" UNCOMMENTED: branden, 2017-04-08
    22  .  if '\*[.T]'utf8' \
    23  .    char - \[hy]
    24  .

Also, the linitan(1) tool that is a huge part of Debian package QA
checks man pages for many problems, and I think this is one of them.
Debian _is_ trying to bring about the day when that bodge can be
reversed.  As with so many things, human effort is often in short supply
on the packaging and/or upstream end, and frankly the ergonomics of \-
work against us.  All of our instincts tell us to document our Unix
commands the same way we type them.  I don't support the recommendation
immediately below, but I am deeply sympathetic to the motivation behind
it.

> > INSIDE manual pages (both -man and -mdoc), let's change - and \- to
> > always map to U+002D HYPHEN-MINUS for all devices and let's tell
> > people to simply use - for HYPHEN-MINUS and stop worrying.
> 
> But PDFs then look awful.
> 
> A new `\(hm' isn't a go-er for the same reason depending on .itc in .TH,
> or a new .TH variant, aren't worthwhile;  man pages have to format on
> old systems that haven't these fangled additions.  And we've managed OK
> without.

I think you mean .TP, right?

I find these two issues much more distinct that you do.  We can
reasonably expect .TP to be a recognized macro name with mostly
predictable semantics on _any_ environment that can render man pages.

\hm would be a brand-new thing, and to be portable would require
conditional definition in man pages targeting old systems.  Except I
don't think we can define new character escapes[2], just strings.  Well,
that's what we have conditionals and register tests for, I guess.

I'm still not dissuaded from my .TP/.itc hack.  As I noted in the first
place, the impact on groff-based systems is practically zero; the only
man page it seems to regress is ksh93, and I'm not sure that's my hack's
fault since the usage that regresses _isn't even in_ a .TP macro.  There
may be an underlying bug.  In any case, I do need see if my hack
regresses existing, reasonably-well-formed man pages when they're
rendered with other implementations.  That's on the to-do list.

As I said to Ingo, I'm only worried about doing damage to
actually-existing *roff implementations, not some that might exist in an
alternate universe under an alternative interpretation of the
nowhere-formally-standardized *roff language.

> an-ext.tmac I've never used, just read it.  Is it mentioned in a man
> page?  I even checked `info groff | cat' and didn't spot it.  :-)

On groff-using systems, it's everywhere and you get it whether you like
it or not.  :)

man(1) foists its input off onto the andoc "macro package", which to my
eye simply does some trickery using the fact of the totally-disjunctive
namespace of the an and doc macros to overload TH and Dd to cause an mso
of the relevant package.

In groff, loading the an macro package means loading an-old.tmac, which
goes ahead and loads an-ext.tmac _for_ you.  Unconditionally.

$ grep -4 -nH an-ext\\. $(pwd)/an-old.tmac
/usr/share/groff/current/tmac/an-old.tmac-684-.  char  ` \N'96'
/usr/share/groff/current/tmac/an-old.tmac-685-.\}
/usr/share/groff/current/tmac/an-old.tmac-686-.
/usr/share/groff/current/tmac/an-old.tmac-687-.\" Load man macro extensions.
/usr/share/groff/current/tmac/an-old.tmac:688:.mso an-ext.tmac
/usr/share/groff/current/tmac/an-old.tmac-689-.
/usr/share/groff/current/tmac/an-old.tmac-690-.\" Load local modifications.
/usr/share/groff/current/tmac/an-old.tmac-691-.mso man.local
/usr/share/groff/current/tmac/an-old.tmac-692-.

> I'm not keen on embedding it at the start of every man page in a
> project, even if they're built from a source just to achieve that.
> What's the alternative?  `.so ../man1/foo-an-ext.tmac' and release it
> to sit alongside the foo project's other man{1,8} pages.  It obviously
> mustn't appear as a page itself in indexes, etc.

.mso an-ext.tmac already does that.  The embedding is only for *roff
implementations that don't have that file.  But I see what you mean.  I
think a more canonical approach would be to install foo-an-ext.tmac to
/usr/share/groff/current/tmac and .mso it.  Autoconf can of course help
with determining the installation directory.

Anyway, I think it's fine to leave this sort of decision up to package
authors/maintainers.  They can:

1.  Cut-and-paste the an-ext macros they use into the header of their
    man pages, as groff_man(7) recommends; or
2.  Ship a package-specific "fork" (copy) of an-ext.tmac, perhaps
    stripped down to just the macros they want to use (but why)? and
    .mso it from their pages; or
3.  Check for the definition of one of the an-ext macros that one is
    actually using in the page and .ab out if it's missing; or
4.  Ignore legacy environments completely.

Mind you, the status quo is that lots[1] of man pages are written doing
only #4, _today_.  Who here is aware of complaints of pages not
rendering right because EX, EE, SY, YS, OP, UR, UE, and so forth not
being defined?  Or do people experiencing this problem largely lack the
knowledge to diagnose it, and think the page is just ugly?  If that's
the case, nothing I am proposing is going to make that problem _visibly_
worse.  Invisible problems will stay invisible.  _Any_ of the first 3
techniques above is an improvement on that situation.

Regards,
Branden

[1] I'll be doing a count of man pages using any of the "extended"
    macros in the near future, and report my findings.
[2] I tried ".char hm \N'45'", without success.

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]