[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Review incorrect man-pages commit

From: G. Branden Robinson
Subject: Re: Review incorrect man-pages commit
Date: Sun, 20 Mar 2022 21:52:37 +1100
User-agent: NeoMutt/20180716

Hi, Alex!

At 2022-03-20T01:04:17+0100, Alejandro Colomar (man-pages) wrote:
> Michael introduced the following commit, which is incorrect (triggers
> a groff(1) error; see below).  Do you know what is intended here?
> Could you please propose a fix?

Sure!  The punctuation does get a bit bewildering.

The first topic is equivalence classes in globs.

> LINT (groff)  tmp/lint/man7/glob.7.lint.groff.touch
> troff man7/glob.7     195      error   '\`' is not allowed in an escape name
> troff man7/glob.7     195      warning         can't find special character ''

>  For example, "\fI[[=a=]]\fP" might be equivalent
> -to "\fI[a\('a\(`a\(:a\(^a]\fP", that is,
> +to "\fI[a\('a\(\`a\(:a\(^a]\fP", that is,

UTF-8 continuation bytes follow in this message.

So what we're trying to say is
  "[=a=]" might be equivalent to "[aáàäâ]"

The man page is using groff special character escape sequences that are
compatible with AT&T troff (1973) in _form_, but the special character
_identifiers_ themselves are not portable that far back.  The form is:
...where "xx" is _exactly_ two characters forming an identifier for a
specific special character.  As is somewhat well known, groff supports
identifier of arbitrary length in escape sequences; anywhere AT&T troff
has an escape sequence syntax form ending in "(xx", groff supports
an additional form "[xxxxxxx]".

Nota bene that word "identifier".

The ones we see above are aliases for commonly used ISO Latin-1 (1985)
characters.  groff supports a more systematic notation for composite
glyphs, that being
  \[base-glyph composite-1 composite-2 ...  composite-n]
and in the instant case, only one composite glyph is used.

Glyph identifiers in groff must consist of valid identifier characters.
The escape character \ is _not_ interpreted as an identifier character,
but has its usual meaning of introducing an escape sequence.  Thus, when
the parser hits the expansion of \` and has problems.  \` is itself an
alias for another special character escape sequence: "\(ga".  (This
alias _is_ portable all the way back to AT&T troff, and is documented in
Ossanna 1976, "Nroff/Troff User's Manual"--but that still doesn't make
it a valid part of a special character identifier.  Heirloom Doctools
troff silently ignores it, and I thus suspect Unix V7 troff did too.)

Thus, the special character you're naming has another special character
as part of its identifier.  That is not allowed.

That is why an error is produced.

Now, for the part people actually care about, which is how to fix it:
take the escape character off of that `.

You thus want

+to "\fI[a\('a\(`a\(:a\(^a]\fP", that is,

If you wanted to write this without using any aliases, you could adopt
groff syntax.

+to "\fI[a\[a aa]\[a ga]\[a ad]\[a a^]\fP", that is,

I don't know if people regard that as more or less impenetrable.  It is
more _flexible_, and admits usage of diacritics/combining characters not
envisioned by AT&T troff or ISO Latin-1.  groff supports a baker's
dozen.  They are in a table titled "Accents" in groff_char(7) (1.22.4).

> diff --git a/man8/zic.8 b/man8/zic.8
> index 940d6e814..aeca0e726 100644
> --- a/man8/zic.8
> +++ b/man8/zic.8
> @@ -293,7 +293,7 @@ nor
>  .q + .
>  To allow for future extensions,
>  an unquoted name should not contain characters from the set
> -.q !$%&'()*,/:;<=>?@[\e]^`{|}\(ti .
> +.q !$%&'()*,/:;<=>?@[\e]^\`{|}\(ti .

You didn't proffer any complaints about the foregoing, so I assume it
was just for context (to include the whole commit, maybe).  Nevertheless
I think it can be further improved.

That neutral apostrophe and caret/circumflex should be changed as well,
to ensure that they don't render as a directional closing (right) single
quote, ’ U+2019 and modifier letter circumflex ˆ U+02C6.  This advice is
also in groff 1.22.4's groff_man(7) page.

+.q !$%&\(aq()*,/:;<=>?@[\e]\(ha\`{|}\(ti .

Moreover, as partly noted in our discussion about double quotes in macro
arguments, there were no special characters for the double quote or
neutral apostrophe in Unix troff.  Since we're not getting 50 years of
backward compatibility anyway, for the Linux man-pages project I
recommend going ahead and using groff-style escape sequences for these.

+.q !$%&\[aq]()*,/:;<=>?@[\[rs]]\[ha]\`{|}\[ti] .

Are you willing to settle for 30 years of backward compatibility?  ;-)

In my opinion it is more helpful in dense contexts like this to have the
paired delimiters [ ] to demarcate the glyph identifier then to achieve
portability to systems that don't support identifiers you need anyway.

(I note that `q` is a page-local macro and therefore bad style for
portability reasons.  That said, I have been _sorely_ tempted to add a
`Q` macro for this precise purpose to groff man(7).  I have hopes that
it would give people something to reach for besides bold and italics for
every damn thing.)

Most--I hope all--of the above is discussed comprehensively in the
current version of groff_char(7)[2], which I have rewritten completely
since groff 1.22.4 and substantially modified even since the last Linux
man-pages snapshot at
<>.  I now know
the answers to many questions of the form "why the **** is {groff,troff}
this way?", and have endeavored to share them.  The "History" section is
completely new.


[1] groff's own man pages are not without sin in this regard.  I have
    cleaned them up a lot since 1.22.4, but a few adventurous stragglers
    remain that define and use page-local macros pervasively.  All are
    on the long side.


    I recommend that for source perusal only; do not try to render it
    with man-db man(1) or groff 1.22.4, because groff 1.23.0 will be
    adding a new macro, `MR`, for man page cross references[3] and its
    own pages have already been ported to use it.  (This is where I
    flagellate myself for not having a groff 1.23.0-rc2 out yet. :( )


Attachment: signature.asc
Description: PGP signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]