[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Groff] mapping of glyphs to Unicode

From: Werner LEMBERG
Subject: Re: [Groff] mapping of glyphs to Unicode
Date: Tue, 14 Feb 2006 21:44:01 +0100 (CET)

> One should expect that the mapping from groff glyphs to Unicode is
> device independent, right?

Mhmm, a difficult topic...  groffers, please raise your hands if you
object to the results of this discussion.

> - devhtml maps "hy" and "-" to U+002D.
>   devutf8 and glyphuni.cpp map "hy" and "-" to U+2010.

`U+2010' is the right one but problematic since many editors are not
capable to search for it in a user-friendly way.  Maybe we should
apply the same solution as is done with the devutf8 backend: Use
U+2010 and let distributions overwrite it with the .char request (see
the PROBLEMS file).

> - devhtml maps "la" to U+2329, "ra" to U+232A.
>   devutf8 and glyphuni.cpp map "la" to U+27E8, "ra" to U+27E9.

I have a W3C document `entities.html' which describes HTML 4.  In this
file, U+2329 and U+232A are used for ⟨ and ⟩ -- has this
changed meanwhile?

Personally, I prefer U+27E8 and U+27E9 which are never double-width

> - devhtml and devutf8 map "[" to U+005B.
>   glyphuni.cpp doesn't.
>   Why?

Actually, it does. `[' and `lB' are identical (and thus get the same
entry in the font description files), and glyphuni.cpp maps only a
single entry to another one.  The same is true for the similar cases
you mention.

> - glyphuni.cpp maps "shc" to U+00AD.
>   devhtml and devutf8 don't.
>   Why?

Under normal circumstances, `shc' is never a glyph in groff.  From the
NEWS file:

  Using the latin-1 input character 0xAD (soft hyphen) for the `shc'
  request was a bad idea.  Instead, it is now translated to `\%', and
  the default hyphenation character is again \[hy].  Note that the
  glyph \[shc] is not useful for typographic purposes; it only exists
  to have glyph names for all latin-1 characters.

With other words, if you need a soft hyphen, use `\%', not \[shc].
The file `latin1.tmac' does the same.

> - devhtml and devutf8 map "+f" to U+03C6 and "*f" to U+03D5.
>   glyphuni.cpp does the opposite: maps "*f" to U+03C6 and "+f" to
>   U+03D5.  According to the info in the groff_char.7 man page,
>     "symbol `\[*f]' always denotes the stroked  version of phi, and
>      `\[+f]' the curly variant."
>   this means that glyphuni.cpp is wrong. I vote for following
>   current standards.


> - devhtml and devutf8 map "<<" to U+226A and ">>" to U+226B.
>   glyphuni.cpp does the opposite: maps ">>" to U+226A and "<<" to
>   U+226B.  IMO glyphuni.cpp is wrong.

A typo.  Fixed.  Thanks for the reports.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]