bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#24924: GNU pr only working with singlebyte 1-width characters


From: Stephane Chazelas
Subject: bug#24924: GNU pr only working with singlebyte 1-width characters
Date: Thu, 1 Dec 2016 08:49:39 +0000
User-agent: Mutt/1.5.21 (2010-09-15)

2016-12-01 07:04:05 +0000, Stephane Chazelas:
> 2016-11-30 18:37:05 -0800, Paul Eggert:
> [...]
> > In the meantime if you could submit a patch for the
> > documentation that should fix the immediate documentation
> > problem.
> [...]
> 
> What about:
[...]
> +Please note that @command{pr} currently doesn't support multi-byte characters
> +or non-ASCII characters that have a null or double width. If such characters
> +occur in the input or column separators, column alignment may be off or lines
> +may exceed the page width. There is also no provision to support 
> bidirectional
> +text.
[...]

Actually, it seems it can also truncate lines in the middle of
some characters though it seems it's confined to multibyte
characters that have byte values <= 127 like:

$ locale charmap
BIG5-HKSCS
$ printf '\ue9\ue9\ue9\n' | pr -w5 -t2 | hd
00000000  88 6d 88 6d 88 0a                                 |.m.m..|
00000006

See how that third é (0x88 0x6d in BIG5-HKSCS) was truncated in
the middle.

It's as if it was considering all byte values >= 128 as having
zero width in multi-byte locales (and only in multi-byte
locales, that doesn't seem to occur in single-byte ones).

So maybe:

diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index cc85f22..15088ce 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -1838,6 +1838,13 @@ For single
 column output no line truncation occurs by default.  Use @option{-W} option to
 truncate lines in that case.
 
+Please note that @command{pr} currently doesn't support multi-byte characters
+or non-ASCII characters that have a null or double width. If such characters
+occur in the input or column separators, column alignment may be off or lines
+may exceed the page width, or truncation may occur in the middle of some
+characters producing invalid text output. There is also no provision to support
+bidirectional text.
+
 The following changes were made in version 1.22i and apply to later
 versions of @command{pr}:
 @c FIXME: this whole section here sounds very awkward to me. I

-- 
Stephane





reply via email to

[Prev in Thread] Current Thread [Next in Thread]