[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 1.23: UTF-8 device: more display oddities

From: G. Branden Robinson
Subject: Re: 1.23: UTF-8 device: more display oddities
Date: Fri, 16 Sep 2022 17:32:36 -0500

At 2022-09-16T23:56:58+0200, Steffen Nurpmeso wrote:
>  |Letting aside the hyphen-minus -> hyphen thing that i fixed for me
>  |locally, there is also the problem that
>  |
>  |  ` U+0060, GRAVE ACCENT, "backtick"
>  |
>  |is displayed as
>  |
> Also
>   ~ U+007E, TILDE
> is displayed as
> which here sits at the height of an accent here, for example the
> Putting it all together it really looks totally odd here:
>   i=`echo '~/home^run'`
> becomes
>   i=‘‘echo ’˜/homeˆrun’‘’
> How is anyone supposed to document a sh(1)ell-style manual with
> mdoc(7) (i do not know about man(7)) with these settings?

By reading the manual, Steffen.

UTF-8 content follows.

       On ISO systems, code points in the range 33–126 comprise a common
       set of printable glyphs in all of the aforementioned ISO
       character encoding standards.  It is this character set and (with
       some noteworthy exceptions) the corresponding glyph repertoire
       for which AT&T troff was implemented.
       The table below presents the seven exceptional code points with
       their typical keycap engravings, their glyph mappings and
       semantics in roff systems, and the escape sequences producing the
       Unicode basic Latin character they replace.  The first, the
       neutral double quote, is a partial exception because it does
       represent itself, but since the roff language also uses it to
       quote macro arguments, groff supports a special character escape
       sequence as an alternative form so that the glyph can be easily
       included in macro arguments without requiring the user to master
       the quoting rules that AT&T troff required in that context.
       (Some requests, like ds, also treat " non‐literally.)
       Furthermore, not all of the special character escape sequences
       are portable to AT&T troff and all of its descendants; these
       groff extensions are presented using its special character form
       \[], whereas portable special character escape sequences are
       shown in the traditional \( form.  \- and \e are portable to all
       known troffs.  \e means “the glyph of the current escape
       character”; it therefore can produce unexpected output if the ec
       request is used.  On devices with a limited glyph repertoire,
       glyphs in the “keycap” and “appearance” columns on the same row
       of the table may look identical; except for the neutral double
       quote, this will not be the case on more‐capable devices.  Review
       your document using as many different output devices as possible.

      │Keycap   Appearance and meaning   Special character and meaning   │
      │"        " neutral double quote   \[dq] neutral double quote      │
      │'        ’ closing single quote   \[aq] neutral apostrophe        │
      │-        ‐ hyphen                 \- or \[-] minus sign/Unix dash │
      │\        (escape character)       \e or \[rs] reverse solidus     │
      │^        ˆ modifier circumflex    \(ha circumflex/caret/“hat”     │
      │`        ‘ opening single quote   \(ga grave accent               │
      │~        ˜ modifier tilde         \(ti tilde                      │

There is also the "Portability" section of groff_man(7) [groff 1.22.4]
or groff_man_style(7) [groff 1.23].

       Several special characters are also widely portable.  AT&T troff
       did not define the reverse solidus or quotation characters listed
       below, but any of its descendants, like Plan 9 or Solaris troff,
       can support them by defining their glyphs in font description
       files; see groff_font(5).

       \-     Minus sign or basic Latin hyphen‐minus.  This escape
              sequence produces the Unix command‐line option dash in the
              output.  “-” is a hyphen in the roff language; some output
              devices replace it with U+2010 (hyphen) or similar.

       \(aq   Basic Latin neutral apostrophe.  Some output devices
              replace “'” with a right single quotation mark.

       \(cq   Opening (left) and closing (right) single quotation marks.
              Use these for paired directional single quotes, ‘like

       \(dq   Basic Latin quotation mark (double quote).  Use in macro
              calls to prevent ‘"” from being interpreted as beginning a
              quoted argument, or simply for readability.

                     .BI "split \(dq" text \(dq

       \(rq   Left and right double quotation marks.  Use these for
              paired directional double quotes, “like this”.

       \(em   Em‐dash.  Use for an interruption—such as this one—in a

       \(en   En‐dash.  Use to separate the ends of a range,
              particularly between numbers; for example, “the digits

       \(ga   Basic Latin grave accent.  Some output devices replace “`”
              with a left single quotation mark.

       \(ha   Basic Latin circumflex accent (“hat”).  Some output
              devices replace “^” with U+02C6 (modifier letter
              circumflex accent) or similar.

       \(rs   Reverse solidus (backslash).  The backslash is the default
              escape character in the roff language, so it does not
              represent itself in output.  Also see \e below.

       \(ti   Basic Latin tilde.  Some output devices replace “~” with
              U+02DC (small tilde) or similar.

Or you can just do the brute force thing.  From groff 1.23's "PROBLEMS"


* When viewing man pages, some characters on my UTF-8 terminal emulator
  look funny or copy-and-paste wrong.  Why?

Some Unicode Basic Latin ("ASCII") input characters are mapped to
non-Basic Latin code points in output for consistency with other output
devices, like PDF.  See groff_man_style(7) and groff_char(7) for correct
input conventions and background.  If you use the correct groff special
character escape sequences to input them, you will get correct output no
matter what device the input is formatted for.

However, many man pages are written in ignorance of the correct special
characters to obtain the desired glyphs.  You can conceal these errors
by adding the following to your site-local man(7) configuration.  The
file is called "man.local"; its installation directory depends on how
groff was configured when it was built.

--- start ---
.if '\*[.T]'utf8' \{\
.  char ' \[aq]
.  char - \-
.  char ^ \[ha]
.  char ` \[ga]
.  char ~ \[ti]
--- end ---

You may also wish to do the same for "mdoc.local".

In man pages (only), groff maps the minus sign special character '\-' to
the Basic Latin hyphen-minus (U+002D) because man pages require this
glyph and there is no historically established *roff input character,
ordinary or special, for obtaining it when a hyphen and minus sign are
both separately available.  To obtain a true minus sign, use the special
character escape sequences '\(mi' or '\[mi]'.


Didn't I already share this information with you?

Hmm, yes, I did.[1]

Possibly groff_mdoc(7) could use a "Portability" section as well.  I
happen to be in the midst of some major revisions of that page, but on
the other hand your refusal to read the documentation I have already
served up to you on a platter does nothing to supply motivation.

Typography isn't for everyone.  There's always Markdown.  It might
better fit the write-only attitude you have manifested in your
contributions to this mailing list.



Attachment: signature.asc
Description: PGP signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]