[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [groff] 03/09: tmac/an-old.tmac: Stop remapping ` and '.

From: Anthony J. Bentley
Subject: Re: [groff] 03/09: tmac/an-old.tmac: Stop remapping ` and '.
Date: Sun, 1 Nov 2020 01:33:24 -0600

Hi Branden,

I think you're a bit overfocused on how few *languages* use ` when the
focus should be on how many *manuals* use `. There are many. And ` is
not separable from ', which is used by far far more.

On Fri, Oct 30, 2020 at 9:18 PM G. Branden Robinson
<> wrote:
> > Does POD even provide a capability to semantically separate prose from
> > code literals and escape characters accordingly?
> I know Russ Allbery from Debian and he's a reasonable guy.  I do so
> little Perl programming that I'm not even sure what pod2man[1] actually
> gets _wrong_ in this department.  But I'm confident Russ can be
> approached with a well-motivated change request if the problem is
> articulated clearly.

Let me explain what I was referring to here. I don't know Perl, but in
my one significant encounter with POD (converting a subset of the
LibreSSL documentation to -mdoc), it didn't seem to be a particularly
semantic format. So I browsed a random page, perldata(1), looking for
apostrophes. It had examples like this:

       Scalar values are always named with '$', even when referring to a
       scalar that is part of an array or a hash.

Obviously this will misrender if ' is substituted. But a bad opening
quote isn't the end of the world; I see worse in Microsoft Word
documents every week. What about code blocks?

           @days{'a','c'}      # same as ($days{'a'},$days{'c'})

In the POD source, this is simply prefixed with spaces:

    @days{'a','c'}      # same as ($days{'a'},$days{'c'})

pod2man(1) turns it into this:

\&    @days{\*(Aqa\*(Aq,\*(Aqc\*(Aq}      # same as

All right, that at least will display correctly. What about inline
uses in text that are more important than simple quotes within prose?

       In some cases, it may be a chain of
       identifiers, separated by "::" (or by the slightly archaic "'")...

In the POD source, it's:

In some cases, it may
be a chain of identifiers, separated by C<::> (or by the slightly
archaic C<'>)...

And the pod2man(1) output is:

In some cases, it may
be a chain of identifiers, separated by \f(CW\*(C`::\*(C'\fR (or by the slightly
archaic \f(CW\*(C`\*(Aq\*(C'\fR)...

Whew! Certainly not to my taste, but at least it renders correctly. So
the situation for POD is not as bad as I feared. Use of ' as a left
quote is widespread in the Perl docs, but it appears any literal ' not
properly marked up can be wrapped in C<>.

What would you tell Perl people to do about those left quotes, though?
Tell them to use UTF-8?

The next question, has a similar analysis to this been done to the
conversion tools for docbook, asciidoc, rst? This is what I meant by
my previous wish for more discussion prior to such a big change.
Personally, I would not entertain the thought of changing any default
unless those tools had been checked, and fixed if necessary. And that
is still ignoring an analysis of how this will affect
non-autogenerated pages.

> > Will you argue that literal ASCII hyphens are "not ultra-common"
> > in manpages too? Be serious.
> Oh, no, they _are_ ultra-common.  There's one right there.  The good
> news is that man page writers have much higher awareness of the
> hyphen-minus problem already, and correct practice is already
> widespread.  So hyphens that should be pastable dashes are already '\-'
> in many cases.

Not sure I believe that awareness is so high. In my experience, most
people who escape hyphens simply blindly replace them all with \-,
resulting in many en dashes that should be hyphens!

So you see, I remain fairly confident in my belief that
typographically speaking, spitting out plain old ASCII hyphen-minus
and straight apostrophe are a lesser sin than incorrect curly quotes,
Unicode hyphens and dashes. :)

> What do you think?

Not sure you'll like my answer... as I'm merely a mandoc user and
markup junkie, not a mandoc developer, I tend to defer to Ingo for
policy decisions such as this

You mentioned that these translations were only turned off in 2008,
and before that they were turned on. I presume this was done in
response to UTF-8 terminal locales becoming popular, and the change
was made because too many poorly written manuals rendered badly. I'm
not convinced the situation is greatly improved, and that means
reverting it is premature. Changing the default, even if it's
something distributions *can* customize, is too big a step until more
concrete work has been done improving existing manpage markup in the
free software ecosystem. People like you and I, overobsessed with
typographic minutiae, should continue our current practice of
rendering in HTML and PDF and inspecting source manually to find these
errors, to save the poor users and their copy-paste. That is what I

Anthony J. Bentley

reply via email to

[Prev in Thread] Current Thread [Next in Thread]