[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [groff] Mapping of \(bu to MIDDLE DOT

From: Jeff Conrad
Subject: Re: [groff] Mapping of \(bu to MIDDLE DOT
Date: Thu, 28 Mar 2019 11:25:52 +0000

On Thursday, March 28, 2019 3:01 AM, G. Branden Robinson wrote:

> At 2019-03-27T04:34:18+0000, Jeff Conrad wrote:
> > Is there a reason that tty.tmac translates \(bu to \(pc or \(md
> > regardless of the output device or whether \(bu is available?
> >
> > .ie c\[pc] \
> > .  tr \[bu]\[pc]
> > .el \
> > .  if c\[md] \
> > .    tr \[bu]\[md]
> Are you looking at an old implementation?  There's some important
> context missing here:

Yep-I'm using 1.22.3.  Running Windows, I've had to diddle a few things,
so the upgrade isn't as simple as it could be.

> $ nl /usr/share/groff/1.22.4/tmac/tty.tmac | sed -n '14,21p'
>     14  .if !'\*[.T]'utf8' \{\
>     15  .  ie c\[pc] \
>     16  .    tr \[bu]\[pc]
>     17  .  el \
>     18  .    if c\[md] \
>     19  .      tr \[bu]\[md]
>     20  .\}
>     21  .

> It sure seems like you might be re-reporting a problem Carsten Kunze
> raised in June 2015, and which prompted Werner to wrap the conditional
> you mention in an "if device is not UTF-8" block:


Again, yep-I used the wrong search query ...

> Really we shouldn't be conditional on UTF-8 per se, but on the existence
> of the bullet glyph in the font for the tty device.

Completely agree.

> However, the tty device ignores fonts ...

> these devices can report their character repertoire up to an
> application.  VGA-style console devices, framebuffer consoles, and GUI
> terminal emulators can even change these on the fly.  (Who else
> remembers live-hacking the display font in MS-DOS?)

We're obviously at the mercy of the chosen font (on Windows, I use
Lucida Console as the best of very limited options).  But the device at
least gives us a reasonable idea of what's possible.

> So Werner's fix worked because there were (and are) no nroff/tty devices
> in the groff tree that supported the bullet character _except_ -Tutf8.
> My recommendations are:
> 1) Upgrade to groff 1.22.4; and
> 2) Change the conditional on line 14 of tty.tmac from:
>     14  .if !'\*[.T]'utf8' \{\
> to:
>     14  .if !c\[bu] \{\
> ...and tell us if that fixes your problem.

Making this change (which I've already done) indeed fixes things.

> Personally, I advocate incorporating cp1252 into groff.  It's only an
> 8-bit character set, should therefore be a low maintenance burden, and
> really should make life a bit more bearable for groff's Windows users.
> And that's good PR for groff, GNU, copyleft, and Free Software.

It's yours for the asking; it's really just latin1 with the additional
characters that Microsoft added to the C1 area.  I went a bit further
and added spelled-out representations of missing Greek characters (I
hate missing symbols; in the old, old days, I guess one would print the
document and write in the missing symbols.  Yeah, right ...).  But if
these additions aren't for everyone, they're easily deleted.

> > Even for Tlatin1, I'd prefer an asterisk or even the age-old
> > overstruck '+' and 'o'.  Isn't the general rule for nroff to make the
> > best possible visual approximation when the true character isn't
> > available?
> As noted above, knowing what will actually show up on the output
> device is, in principle, impossible for nroff/tty output devices.

The user needs to pick the most appropriate font; there don't seem
to be all that many choices that we need to worry about.

> However, we can generally assume that users of 8-bit encodings will
> have comprehensive fonts available by default--they'd have to go out
> of their way to avoid them.

But 8-bit encodings (e.g., ISO 8859) have their limitations; in
particular, they're missing most of the common punctuation characters
used in typesetting. The MS extensions addressed most of this.

> Life is harder in UTF-8 world.

Yep.  Especially on Windows.  I had to hack the devutf8 font files to
use U+002D rather than U+2010 for a hyphen, because Lucida Console
doesn't include the latter. Ya do what ya gotta do ...

But Microsoft are working on it ...

Skip to "Are we there yet?" near the end if you're less than fascinated
with the topic.

> To get that asterisk:
> In your documents, or your .troffrc, could you not do this?
> .fchar \[bu] *

Yes.  I've already done something similar.  But this won't help with the
few files I generate for general distribution.  For example, for GNU
units, we generate a man page from texinfo source with a perl script,
and obviously can't assume a customized .troffrc-so we include a few hacks
to override some groff settings (e.g., ".tr \(oq'").  We actually don't
even assume groff, so we try to cover all the bases; this probably is
overkill nowadays.

> As a minor point, I do think the existing fallback should be reversed in
> order:
> From:
> .fchar \[bu] \z+o
> To:
> .fchar \[bu] \zo+

Interesting how we differ on this.  I don't like either alternative, but
find the 'o' more instantly recognizable-it's sorta kinda a circle.  As
I recall, the AT&T version 2 nterm files that I had in the late 1980s
had it as you suggest, and I reversed it.  I guess it's a matter of
personal preference.  The asterisk avoids the problem.

> The \z+o status quo seems to follow a pattern that makes sense for
> modified letterforms, i.e., \z'a; on a 7-bit ASCII, non-overstriking
> device, you want the "a" to "win", because it carries the more important
> semantic information.

In general, I completely agree.

> That reasoning does not hold for bullet substitutes, which simply need
> to stand out graphically (your argument for not using a middle dot or
> centered period, which may be as small as one pixel on some devices),
> and not be semantically confusable with text.

In this circumstance, I don't know whether we can really separate
graphics and semantics.

> As "o" is actually a word (even in English, though much more prominently
> in Spanish), I find the present arrangement unfortunate.

I think it's largely a matter of context.  As the tag for a list, I
think confusion would be unlikely.  And again, an asterisk-perhaps ugly
but arguably the most common ASCII approximation of a bullet-would seem
to avoid the problem.

In my senior year of high school, I had an English teacher-a PhD-who
tried to drill into us that the "best" English is that which provides
the maximum communication (and it generally avoids pompous polysyllabic
pronouncements).  I suggest something similar for the "best" groff.  Of
course, it's not always easy to reach consensus on the details.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]