groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Groff] UTF-8 \(la and \(ra glyphs


From: Werner LEMBERG
Subject: Re: [Groff] UTF-8 \(la and \(ra glyphs
Date: Mon, 24 Feb 2003 15:36:11 +0100 (CET)

> font/devutf8/R.proto gives the width of the \(la and \(ra glyphs as
> 24, which is the standard unit width on this device.

Correct.

> However, localedata/charmaps/UTF-8 in glibc lists U+2329 and U+232A
> as being double-width characters, following
> http://www.unicode.org/Public/UNIDATA/EastAsianWidth.txt, so
> wcwidth() returns 2 for each of them;
> http://www.unicode.org/Public/UNIDATA/NamesList.txt notes that they
> are used as CJK punctuation.

But not only.  At the same time, you can find the following sentence
in the annotation of those two characters (file NamesList.txt):

  discouraged for mathematical use because of canonical equivalence to
  CJK punctuation

> I'd submit a patch to correct their width to 48 except that I'm not
> sure exactly what I should be patching - is there a script somewhere
> which generates these fonts?

No.  They have been supplied by Bruno Haible with constant editing by
me.

> I do wonder why \(la and \(ra are used in www.tmac to delimit e-mail
> addresses, since they won't copy-and-paste correctly to < and >. Or
> perhaps the UTF-8 mapping for these two characters ought to be
> changed to U+003C and U+003E respectively.

`<' and `>' look bad in printed output; actually, those two characters
are not delimiters -- I really think that \[la] and \[ra] are the
right symbols.

And no, I won't map \[la] and \[ra] to `<' and `>' for UTF-8.  If you
want to do that, please overwrite it locally in the configuration file
of the particular macro package.

A different question is whether U+2329 and U+232A are the right code
points, and your email convinced me that they are not.  Due to the
canonical equivalence to U+3008 and U+3009 (which also affects the
width of the character) I will change the code points to the new
values U+27E8 MATHEMATICAL LEFT ANGLE BRACKET and U+27E9 MATHEMATICAL
RIGHT ANGLE BRACKET.  I foresee difficulties with that mapping since
U+27E8 and U+27E9 are very recent characters added in Unicode 3.2
which probably don't exist in many fonts, but I believe it is better
to avoid such compromises in the `official' groff version.

Thanks for pointing this out.


    Werner


PS: BTW, in HTML 4, the entities &lang; and &rang; officially map to
    U+2329 and U+232A, so I won't change it for grohtml.  Probably the
    mapping will be revised for the next HTML version...

reply via email to

[Prev in Thread] Current Thread [Next in Thread]