groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: man(7), hyphen, and minus


From: G. Branden Robinson
Subject: Re: man(7), hyphen, and minus
Date: Sat, 24 Dec 2022 00:28:35 -0600

Hi Russ,

At 2022-12-23T10:03:13-0800, Russ Allbery wrote:
> "G. Branden Robinson" <g.branden.robinson@gmail.com> writes:
> > That's fair, and it isn't the first time I've heard capable people
> > express the opinion that having a document translator produce
> > idiomatic man(7) font alternation macro calls rather than chains of
> > font selection escape sequences was Just Too Damned Hard.  If I
> > could show people how to do it, I might do so with a swagger, but I
> > confess I can't cash that check at present.
> 
> Yeah, the difficulty lies mostly in the layering, because people can
> write POD source that is nonsensical in a man page context but that I
> still have to do something with.  Stuff like
> C<<< B<< L<foo(1)> >> >>>.

The *roff language does not maintain a stack of typeface changes.  How
radical a change to POD would it be to reject constructions like the
above?

> It makes no sense to make the man page reference, which one could
> otherwise nicely represent as:
> 
> .BR foo (1)

Right.

> also bold and fixed-width, but if that's what someone wrote in the POD
> source, I have to do *something* with it.  And that means either
> trying to analyze global state or having to parse the *roff that I
> output in an earlier stage.

Not a fate I would wish on you.

> > Here, I know your pain.  I took it upon myself to document this shit.
> 
> Thanks for this, I should have thought to look at the groff manual
> about it.

The groff 1.22.4 and earlier docs covered this subject, but not in the
detail I quoted.  That's a recent rewrite for the forthcoming groff
1.23.0.  I have at some point seen a pithy, one-sentence description of
the macro quotation rules that I _think_ covered all of the cases in my
own lengthy presentation, but to correctly parse it, the reader would
have to engage maximum standards-lawyer brain.  I felt that a
slower-paced presentation, with examples, was a better approach.

No \(dq, no peace; know \(dq, know peace.  And it's not even a groffism.

> That corrected a few of my misconceptions about macro arguments.
> (It's very easy for this stuff to all become cargo-cult.

Oh yes indeed.  I've plowed over some of groff's own man pages of their
ersatz airstrips.

> I refer to CSTR 54 all the time, but of course that's limited in its
> detail.)

Some people say that document is all you need to decide any question at
any level of detail, and assemble pyres for the burning of witches who
catalog its errata.

Unix is like Catholicism.  Every aspect has a patron saint with a
devotional cult.

> > I sure hope the reason this was done the way it was because any more
> > accessible approach ran the PDP-11 out of memory.  Murray Hill's
> > agonizingly slow adoption of 'aq' and 'dq' special character
> > identifiers I find difficult to explain given that they bought and
> > paid for a font that included these glyphs on their very first
> > typesetting device.

I should clarify this.[1]

> Yes, it's frustrating that one can't portably just use the special
> character escapes everywhere.

You just about could, if the maintainers of descendants of
device-independent troffs would spend less than five minutes of effort.

But they won't.  They are the natural partners of the "sola scriptura"
party I mentioned above.

> The additional problem that Pod::Man has is that I want to add double
> quotes around literal text if and only if I'm rendering with nroff.
> With troff, the font change is sufficient and I don't want to add
> quotes.

A lot of man pages use bold for literals, even on terminal devices.  I
tend to in groff's own pages, but I _also_ quote multi-word or
potentially ambiguous literals in case the man page is viewed in a
context that strips the typeface (like copying and pasting into an
email).

> The simplest way to do that normally is with a string that's
> defined to either the empty string or the quote mark depending on
> whether rendering is with nroff or troff, but this causes no end of
> hassles when it's inside macro arguments, not to mention the need to
> work around Solaris bugs with font changes.

mdoc(7) "solved" this with a bespoke recursive approach that interpreted
macro arguments as macro names and called them.

I recently proposed adding a `Q` macro to a future groff man(7), but (1)
it is meant only for simple/common cases since the problem I perceive is
that man page writers struggle to use quotation at all and (2) since it
would be a groff feature, Pod::Man either can't use it or would need to
define its own fallback.

I can't think of a way to cut this knot in a Solaris troff-compatible
way.  The combination of the "what was my previous font again?" bug and
refusal to define a special character for "dq" may make it intractable.

But the reinforces the point that the problem with Solaris troff is not
that it is inherently incapable, but that it is frozen and unmaintained.

If someone red-teamed it and found a half dozen security
vulnerabilities, what could we expect Oracle to do about it?

> I'm fairly sure there's some better way of handling this than what I'm
> currently doing, but my brain has not managed to come up with it yet.

Maybe we can put our heads together on this when 1.23.0 is behind me.

> > Whither this antipathy for the neutral apostrophe?
> 
> This has been an interesting long-term struggle.  It was the GNU
> coding style for years to use `' as matched quotes.  I think they've
> finally switched to Unicode quotes instead.

Sort of.  I'd say more that it finally acknowledged the existence of ISO
8859 (free ECMA-94 copy here[2]).  So at long last they advise people
to simply use ' and ", each paired with themselves.[3]

> Technically, of course, the English apostrophe isn't neutral; it's
> curved to the left.

Right.  In formal typography that is true.  The idea in *roff is, and as
I understand it always has been, to express the glyph you _mean_, and
the output device will do its best to honor your requirements.

So, in a roff document, when people type "can't", they want whatever
constitutes a typographical apostrophe in the output.  When they say
  char c = \[aq]\[rs]\[aq]\[aq];
they want the characters that the C language definition identifies as
having special meaning.

> But the ASCII character is used and abused for a bunch of different
> things that aren't really apostrophes.

Yes.  It has a been a painful process for ISO 8859 and then Unicode to
get people to think more abstractly about the "characters" that they
mean to write instead of the "glyphs" that appear before them in their
own composition environment while carrying an assumption that it is the
programming system's responsibility to do what they mean, and make
anyone who reads their output see the same thing.

It has been a decades-long process to pull people up from a
point-and-grunt mentality of typography, and there is still a way to go.

> > With the last proprietary Unixes finally retiring to their coffins
> > or at least throwing in the towel on any delusions of troff
> > maintenance, maybe people will take up some of these conveniences at
> > last.
> 
> Speaking as someone maintaining a generator, it's very difficult to
> know when I can drop support for old Unixes.  It's also very painful
> to be wrong; if I delete a bunch of compatibility code, and then later
> someone really wants it back, adding it back in is awful.

Does that mean you're not hopeful that you will be dropping support for
Solaris troff soon after Oracle does?

I learned the following from Paul Eggert on this list just last
month.[4]

PE> Solaris 10 is no longer supported after January 2024, so if it and
PE> all the other traditional troffs die out by 2024 we can stop
PE> worrying about this then.
PE>
PE> Solaris 11.4, the only Oracle Solaris version that is planned to be
PE> supported after January 2024, is based on groff 1.22.3 instead of on
PE> traditional troff. See:
PE>
PE> https://docs.oracle.com/cd/E88353_01/html/E37839/troff-1.html
PE> https://www.illumos.org/issues/12692

This could buy you a lot of elbow room.

(groff 1.22.3 is 8 years old, but...one dose of Geritol at a time.)

Regards,
Branden

[1] As we can see from the 1976 edition of CSTR #54,[5] the C/A/T's
    "ASCII apostrophe" was not a "neutral apostrophe" as groff
    documentation today describes it; it was mirror symmetric with the
    grave accent and so what you probably do is alias it with \(aa.  No
    roff can guarantee what the glyphs formatted by a typesetter or
    terminal will _look like_.  That is why we more properly call the
    "R", "B", and "S" files _font descriptions_.  As I explain in
    groff_char(7), ASCII outright encouraged the semantic ambiguity of
    some code points.  It was not until ISO 8859 that code point 39
    acquired unambiguous "neutral" semantics.

[2] 
https://www.ecma-international.org/wp-content/uploads/ECMA-94_2nd_edition_june_1986.pdf
[3] https://www.gnu.org/prep/standards/standards.html#Quote-Characters
[4] https://lists.gnu.org/archive/html/groff/2022-11/msg00179.html
[5] https://www.dropbox.com/s/qpk9id0b3w5hu5g/CSTR_54_1976.pdf?dl=0
    (that URL might not work forever)

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]