groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

user-defined characters, translation maps, and environment binding (was:


From: G. Branden Robinson
Subject: user-defined characters, translation maps, and environment binding (was: Proposed: stop subjecting right-hand sides of `char` family requests)
Date: Mon, 24 Apr 2023 03:46:21 -0500

Returning to this issue for a moment before I get back to 1.23 release
candidate concerns...

At 2023-04-01T19:45:19-0400, Douglas McIlroy wrote:
> The first use of .char that came to mind was
>         .char \[ntilde] \o'n~'
> which would collide badly with the following ancient trick for
> unbreakable, unpaddable space. (Ignore the question of whether the
> tilde at hand is usable as a diacritical.)
>         .tr ~
>         a~b~c
> This, I guess, is typical of the motivation for the change.

It enables your "ntilde" use case quoted above, but the proposal was
prompted by another that I think I've mentioned but perhaps should pitch
more explicitly: to make `tr` translation maps part of the current
environment since that seems to comport better with their historical
applications.

Here's a concrete example, inspired by Kernighan & Cherry's "Typesetting
Mathematics -- User's Guide (Second Edition), where the problem you see
below swapped in minus signs for hyphens in the page headers.

[UTF-8 follows.]

$ cat ATTIC/tr-hyphen-to-minus.ms
.ds CH * % *\"
.LP
.\" I get tired of escaping special character escape sequences.
.tr *\(mu
length * width = area
.br
price * quantity = extended price
.br
workers * self-organization = union
.sp 60
skill * experience = craft

$ nroff -ms ATTIC/tr-hyphen-to-minus.ms| cat -s

length × width = area
price × quantity = extended price
workers × self‐organization = union

                            × 2 ×

skill × experience = craft

People seem to use `tr` either for global changes to a document, where
they invoke the request early and never revert it, or for local ones,
where they apply it temporarily and then back it out.

However, in the second case, they will in general have no idea if a trap
will spring before they're done.

In groff, you can turn vertical traps off, but you couldn't in AT&T
troff.  And doing so may have side effects you don't want, like
overrunning a column bottom or footnote area, or the page itself.

Applying `tr` only to the current environment would accommodate the
local use case better at the admitted expense of the global one.  This
would have to be NEWS-documented.  For the sake of historical documents,
we could restrict this behavior change to non-compatibility mode, since
those are the ones most likely to do what I think is the single most
common (albeit historical) global `tr` trick:

.tr ~\" nothing

...which turns the tilde into an unadjustable space.

In groff it is strictly better to use \~, which is not breakable but
_is_ adjustable, or \<space> if you truly do want an unadjustable space.
(And you can still perform these translations explicitly, as with

.tr ~\~

.)

> Suppose the change isn't made? What does .char do for you that .ds
> doesn't?

A user-defined character can:

1.  participate in kerning adjustments;
2.  be assigned "character flags" with `cflags`, as Dave noted--these
    affect how the hyphenation process treats it;
3.  be designated as the hyphenation character, tab character, or leader
    character
4.  be `chop`ped off the end of a string atomically;
5.  is counted as a single element of a string's contents by the
    `length` request; and
6.  if I implement a `for` request as a string (and other object)
    iterator as I plan to for groff-next, it will also be atomic in that
    context.  <https://savannah.gnu.org/bugs/?62264>

This list may not be exhaustive.

A user-defined character cannot be used as the control, no-break
control, or escape character.  (The last would have obvious circularity
problems.)

Today I learned that a control character in the ASCII sense, if
otherwise valid as groff input, _can_ be used as the *roff control,
no-break control, or escape character.

And now that I have learned it, I shall do my very best to forget it.  I
dare not even utter examples for fear of people like "alex ratchev" on
the GNU Bash mailing lists getting a hold of them, if he in fact has a
troff counterpart.  The horror...

To tie this back to `tr` and why these are related discussions, I
presently understand character definitions to be global--
supra-environmental.  I aim to sharpen the distinction between
translations and character definitions by retaining character
definitions' global application while subordinating translation maps to
the environment.

> Certainly nothing essential in the example above. However, it
> can avoid the ugliness of string invocations.

I think there's so much else going on with user-defined characters that
conceiving of them as a slightly slicker(?), shorter way of achieving
string definitions is a bad idea.

> I regard the potential benefit mentioned in the last sentence as
> unpersuasive, but the potential catastrophe of the initial example as
> tilting the scales toward the proposal.

Thanks!  Though I fear losing your support with the rest of this
context, and would appreciate your further perspectives.

At 2023-04-02T08:30:42+0100, Ralph Corderoy wrote:
> > tl;dr: For this input:
> >
> > .tr zx
> > .char \(zz zeezee
> > \(zz top
> >
> > Would you want the output to be "zeezee top" or "xeexee top"?
> 
>     $ preconv | nroff
>     .na
>     .nf
>     .
>     .char £ pound sterling
>     .char $ United States dollar
>     .
>     The £ and $ are almost at par.
>     .
>     .tr aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ
>     £ crashes overnight!
>     .
>     .pl \n(nlu
>     ^D
>     The pound sterling and United States dollar are almost at par.
>     POUND STERLING CRASHES OVERNIGHT!
>     $
> 
> I'd want to see shouty caps.

I think this an excellent example of user-defined character abuse.
There's no reason not to use strings here.

.\" set up for portability
.ie \n[.g] \{\
.  ie (\n[.x] > 1)       .nr use-new-way 1
.  el .if (\n[.y] >= 23) .nr use-new-way 1
.\}
.ie \n[use-new-way] .als UP stringup
.el \{\
.  de UP
.    tr aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ
.    ds \\$1 \\*[\\$1]
.    tr AABBCCDDEEFFGGHHIIJJKKLLMMNNOOPPQQRRSSTTUUVVWWXXYYZZ
.  .
.\}
.rr use-new-way
.\" actual example
.na
.nf
.
.ds P pound sterling\"
.ds D United States dollar\"
.
The \*P and \*D are almost at par.
.
.ds news \*P crashes overnight!
.UP news
\*[news]
.pl \n(nlu

The foregoing becomes much shorter--shorter even than your example--if
one knows one is targeting groff 1.23 or later.  I also went to the
trouble of unwinding the translations after using them, since I think
that is a fairer representation of a real-world troff document.  (K&C
did this quite a bit.)

Perhaps ironically, if we bind translation maps to environments, then
depending on what you use the environment for, you may indeed be able to
get away with never reverting them.  So they make your proof-of-concept
_more_ practical, not less.  (Page headers are an obvious application.)

At 2023-04-10T05:10:11-0500, Dave Kemper wrote:
> Yes, I find it handy to be able to set cflags values on .char-defined
> characters.

Thanks, Dave--I don't know that I would have thought of that one in the
near term.

Regards,
Branden

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]