groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Rendering the em dash on the terminal


From: G. Branden Robinson
Subject: Re: Rendering the em dash on the terminal
Date: Mon, 26 Aug 2024 19:34:29 -0500

Hi Jeff,

Good to hear from you!  As the new guy, it's always nice for me when a
veteran groff maven chimes in.

(Veteran groff detractors, not so much. 😅)

[CCing you just in case; if you'd prefer I didn't, please say so.]

At 2024-08-26T16:41:47-0700, Jeff Conrad wrote:
> > From: groff-bounces+jeff_conrad=msn.com@gnu.org <groff-
> > bounces+jeff_conrad=msn.com@gnu.org> On Behalf Of Dave Kemper
> > Sent: Saturday, 24 August, 2024 12:33 PM
> 
> > The new logic is this:
> > 
> > .ie '\?\*[.T]\?'\?utf8\?' .char \[em] \[em]\[em]
> > .el                       .char \[em] --
> > 
> 
> Aesthetics
> ==========
> > The motivation is given in the commit log: making \[em] look "more
> > like a true em dash, taking up two character cells."
> 
> Dunno if taking up two character cells makes it “look more like a
> true em dash”;

It does on my terminal, xterm using Liberation Sans Mono.

See attachment.

The problem I observed is that an em dash should be close to one em
wide--one em properly considered, that is, as wide as an em quadi, or as
wide as a capital letter is from its top to its baseline.  Ordinary or
"halfwidth" character cell fonts simply don't look like that.

Terminals _have_ developed support for bi-width fonts.  And
there _does exist_ a fullwidth hyphen-minus in Unicode (U+FF0D)...but no
fullwidth em dash.

> it may be more aesthetically pleasing than two hyphens.

That is my view.

> Dash List
> ---------
> There are situations in which I’m not sure what gives the best
> aesthetics.  For example, with mm’s DL (dash list) macro, I might
> prefer
> 
>  —— First item
>  —— Next item
> 
> to
> 
>  -- First item
>  -- Next item
> 
> Neither is great; far better might be
> 
>  — First item
>  — Next item
> 
> But there may be no easy way to get there from here.

In groff 1.24, if you redefine the `EM` string, you'll get whatever dash
you want there.

commit 6a4e2e5cecc4a7ef24e3bf6bfe839d7fdade24b6
Author: G. Branden Robinson <g.branden.robinson@gmail.com>
Date:   Thu Jul 4 20:01:14 2024 -0500

    [mm]: Use `EM` string as `DL` list item mark.

    * contrib/mm/m.tmac (DL): Use the `EM` string as the mark instead of an
      em dash special character literal.

    * contrib/mm/groff_mm.7.man (Macros) <DL>:
      (Strings) <EM>:
    * NEWS: Document this.

> Clarity
> =======
> > An em dash in any monospace font is hard to distinguish from a
> > hyphen and other dash-like glyphs.
> 
> Agree.  And I think _clarity must trump aesthetics_.  A single em
> dash is not obviously seen as such.

The fonts the LWN editor uses seem to render all dash-like symbols the
same.

https://lwn.net/Articles/948720/

> And unlike an en dash (probably seen as a hyphen by most folks anyway,
> even in typeset material, which is why most newspapers seldom use it),
> the distinction is important.
> 
> Sometimes the distinction is important even with an en dash.  A
> reasonable rule is that recognition should fail gracefully.  An
> example might be Oakland’s “Anti Police-Terror Project.”
> Properly, “anti” is a prefix and needs a hyphen, but it’s more
> complicated when it modifies a compound.  Chicago style would use
> “Anti–Police Terror Project”; suffice it to say that the failure
> here is less than graceful.

Might be time to resurrect data transfers over FTP.

> Any approach that has an em dash take up two character cells
> might lead to confusion in a few instances.

Possibly.  It _is_ a hazard, but a minor one more than offset by the
benefit in clarity.  My opinion.

> Two-Em Dash
> -----------
> A two-em dash is often used to indicate omissions: from the
> Chicago Manual of Style (18th ed.), § 6.99,
> 
>     Admiral N—— and Lady R—— were among the guests
> 
> Some folks use a single em dash here, which would look the same
> as above.  But actually using two em dashes would give
> 
>     Admiral N———— and Lady R———— were among the guests
> 
> which isn’t so good.
> 
> Three-Em Dash
> -------------
> A three-em dash is commonly used in a bibliography to indicate
> the same author(s) as the previous entry, e.g.,
> 
>     Chaudhuri, Amit. Odysseus Abroad. Alfred A. Knopf, 2015.
>     ———. A Strange and Sublime Address. Minerva, 1992.
> 
> Input in the normal manner would give
> 
>     Chaudhuri, Amit. Odysseus Abroad. Alfred A. Knopf, 2015.
>     ——————. A Strange and Sublime Address. Minerva, 1992.
> 
> which seems kinda long. But perhaps it’s just me.
> 
> I suppose a workaround might be terminal-specific characters like
> ‘2m’ and ‘3m’.  I long had these as strings, more for ease of
> entry than for handling different devices.  In this case, though,
> it’s not clear how these characters would be handled so there are
> clear distinctions among ‘em’, ‘2m’, and ‘3m’.  And if the
> typographical convention of ‘--’ were to prevail for ‘em’, I’m
> not sure how it would apply to ‘2m’ and ‘3m’.

I despair of cutting these knots.  For these relatively persnickety
matters I think I would prefer to trust the document author to define
strings and exercise formatter facilities to achieve the precise result
they desire.

> Comments
> ========
> > My first concern is that this motivation is communicated only in the
> > commit log, leaving a bit of a head-scratcher to anyone merely
> > reading the code.  If this logic is kept, its motive should be
> > commented in the code.
> 
> This seems reasonable.  Most folks can probably figure this out after
> a bit of head scratching, but it would be nice to spare them the
> trouble.

I certainly can add something here.

> Typographic Convention
> ======================
> > Two em dashes in a row is part of no typographic convention.
> 
> Agree.  But the ‘--’ convention comes from manuscript preparation
> in typewriter days; I wonder how many younger users are even
> aware of it.
> 
> Copy and Paste
> ==============
> > This will paste very poorly into any text field that uses a
> > proportional font.
> 
> How often would someone copy and paste from man(1) output?

I do this frequently.

https://lists.gnu.org/archive/html/groff/2024-07/msg00062.html

> And I
> think the goodness or badness would depend on the target; if the
> target is text, it might look a bit strange because the ‘——’
> sequence isn’t common.  If the target is something destined for
> output in proportional type, I’m not sure ‘--’ is much better.

True, but re-"lifting" monospace terminal _groff_ output to a
proportionally spaced context is a perverse thing to do (unless one is
documenting groff itself).  If you have a typesetting device (or file
format), use it!

This is the man2html story all over again.  Most people produce online
man pages by scraping and (crudely) transforming grotty(1) output.  That
makes me sad.  One of my long-term goals in groff development is to get
people to stop maintaining these scraper-converters by offering an
alternative that they struggle _not_ to prefer.

> The only proper sequence in that case is a single em dash, but as
> we all seem to agree, this isn’t great for output to a monospace
> terminal.
> 
> Full disclosure: I format my man pages as PDF, so I may not be
> the best person to comment on the appearance of output to
> monospace device.

Thank you for exercising this pathway.  Deri James and I put a lot of
work into groff 1.23 to make it nice, and further work into the
forthcoming 1.24 to make it even better.

> Searches
> ========
> > It interferes with greps and other searches: most readers
> > seeing two hyphen-like characters in a row in a monospace font
> > will conclude that they are in fact two hyphens, the
> > longstanding convention, rather than two em dashes.
> 
> Would it?  I’d probably never think to search for ‘——’, but I
> don’t often search for ‘--’, either, because it’s almost always
> context dependent.  Conceivably, I might search for an em dash
> that either precedes or follows a specific text, but such a
> search would work with ‘——’.

When staring at a Unicode terminal, it's a bad idea to assume one knows
what character is there based on its appearance.

Search this email for 'A'.  Now search for 'Α'.  But I repeat myself.

Or do I?

If we're making a bad situation worse, it's by only a small margin, and
the visual clarity in the face of rotten fonts again, I think,
outweighs the argument against.

> Don’t throw stones ...
> ======================
> I make these comments having done things in years past that would
> make ‘——’ look pretty benign.  In the mid-1980s, we used Elan’s
> eroff (basically, AT&T version 2 troff);

I have seen very little on the Internet about eroff, and it also seems
to be lost software with no extant source (or even binaries?).  If you
would take some time to jot down observations about it, that would be
helpful to the posterity of this community.

Even sqtroff seems nearly forgotten in spite of its major role in
getting groff off the ground.

> unfortunately, the downloadable HP fonts we long used had the HP Roman
> 8 character set, which didn’t include em or en dashes or many other
> characters.  Two hyphens in typeset output looked pretty crummy, so I
> came up with
> 
> .ds EM 
> \%\^\v'-.43m'_\h'-\w'_'u/2u'_\h'-3u*\w'_'u/2u'\h'1m'\h'-\w'_'u'_\v'.43m'\^
> 
> (we used mm, so “\*(EM” was the standard way to insert an em
> dash).  To my knowledge, we never had a problem with this getting
> hyphenated; apparently eroff would not break a sequence with an
> unclosed vertical motion.

Interesting.  When I get some round tuits I should find out if GNU troff
will, and if it's worth keeping it from doing so.

> The leading ‘\%’ was added for good measure; I can’t remember proving
> whether it actually helped.
[...]

`\%` has recently annoyed me with its ambiguity.

https://lists.gnu.org/archive/html/groff/2024-03/msg00208.html
https://lists.gnu.org/archive/html/groff/2024-04/msg00000.html

> Convention, Again
> =================
> > But even if the aesthetic concern in monospace-land is given more
> > weight, two em dashes in a row is a less preferable substitution than
> > the longstanding convention of two hyphens.
> 
> This certainly is one with which many of us are familiar, but
> again, I wonder if this is true for many younger users, such as
> some TeX afficionados who use ‘---’.
> 
> So ultimately, I dunno.  For the most common usages, ‘——’ may be
> aesthetically preferable to ‘--’.  But in some less common
> situations, this may confuse more than enhance.  I think it’s
> worth hearing what others think.

For man pages, the mapping can be altered (or removed) in the
"man.local" and "mdoc.local" files.

More generally, it could be dealt with in the "troffrc-end" file.

Regards,
Branden

Attachment: em-dash-on-UTF-8-terminal-on-groff-1.24.png
Description: PNG image

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]