[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: soft hyphen
From: |
Bruno Haible |
Subject: |
Re: soft hyphen |
Date: |
Fri, 25 May 2018 04:16:56 +0200 |
User-agent: |
KMail/5.1.3 (Linux/4.4.0-124-generic; KDE/5.18.0; x86_64; ; ) |
Kaz Kylheku wrote:
> > The program below shows that the answer (on a glibc system) is:
> > The character 0x00AD (= SOFT HYPHEN) is printable but has width == 0.
>
> I tried printing this on several terminals; all actually render
> something that is one character position wide.
>
> A program which calculates column positions on a terminal will be wrong
> if 0xAD has been printed, and it relies on this bogus datum from glibc.
Quoting the Unicode standard:
"Despite its name, U+00AD soft hyphen is not a hyphen, but rather an
invisible format character used to indicate optional intraword breaks.
As described in Section 23.2, Layout Controls, its effect on the
appearance of the text depends on the language and script used."
Yes, this has changed since the ISO-8859-1 times [1], and software needed/needs
to move from the old semantics to the Unicode semantics.
Bruno
[]] http://unicode.org/L2/L2003/03155r-kuhn-soft-hyphen.pdf
Re: performance bug of `wc -m`, Kaz Kylheku (Coreutils), 2018/05/17
Re: performance bug of `wc -m`, Bruno Haible, 2018/05/20