[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Erroneous assumption in isblank.c
From: |
Bruno Haible |
Subject: |
Re: Erroneous assumption in isblank.c |
Date: |
Tue, 5 Oct 2010 11:17:38 +0200 |
User-agent: |
KMail/1.9.9 |
Hi,
John Darrington wrote:
> In lib/isblank.c I see the following:
>
> /* The "blank" characters are '\t', ' ',
> U+1680, U+180E, U+2000..U+2006, U+2008..U+200A, U+205F, U+3000, and none
> except the first two is present in a common 8-bit encoding. Therefore
> the substitute for other platforms is not more complicated than this. */
> return (c == ' ' || c == '\t');
>
> This is incorrect. In iso-8859-1 (a very common 8-bit encoding), U+00A0 is
> the
> non-breaking-space character.
U+00A0 NO-BREAK SPACE is a glyph that carries no ink, but that is like a
non-blank punctuation character for other respects. In particular, its very
definition is that, unlike U+0020 SPACE, it is not an opportunity for line
breaking.
The function isblank() is not used in graphical rendering engines; it is used
in programs that do line breaking, such as 'fold':
coreutils/src/fold.c:178: if (isblank (to_uchar
(line_out[logical_end])))
For this reason, isblank(U+00A0) *must* return false. Otherwise many programs
would treat is like U+0020 SPACE.
Bruno