[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: tr is handling bytes not characters

From: Nick Demou
Subject: Re: tr is handling bytes not characters
Date: Sun, 3 May 2009 23:31:52 +0300

On Wed, Feb 11, 2009 at 9:20 AM, Jim Meyering <address@hidden> wrote:
> [...]
> Ok.  For A and B, please send patches.

I think that this is the patch you need -- I've only edited the files
you told me to and only made the small changes I've said I'll make
(however this is the first time I'm messing with VCS and the relevant
documentation was kind of scary, so please forgive any typical newbie
mistakes). Sorry it took me so long.

>From ca3c7ea230ac32ec865470e5475f5972bbb2ee5c Mon Sep 17 00:00:00 2001
From: Nick D. Demou <address@hidden>
Date: Sun, 3 May 2009 22:03:12 +0300
Subject: [PATCH] doc: tr, clarification regarding limitations in
handling UTF-8,16

modified: doc/coreutils.texi
modified: src/tr.c (printf's in usage fanction)
 doc/coreutils.texi |    5 +++--
 src/tr.c           |    6 ++++++
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index 918f44e..3f4ed72 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -5588,8 +5588,9 @@ The @option{--complement} (@option{-c},
@option{-C}) option replaces
 complement (all of the characters that are not in @var{set1}).

 Currently @command{tr} fully supports only single-byte characters.
-Eventually it will support multibyte characters; when it does, the
address@hidden option will cause it to complement the set of characters,
+Eventually it will support multibyte characters (e.g. UTF-8 or UTF-16
+encoded Unicode characters); when it does, the @option{-C} option
+will cause it to complement the set of characters,
 whereas @option{-c} will cause it to complement the set of values.
 This distinction will matter only when some values are not characters,
 and this is possible only in locales using multibyte encodings when
diff --git a/src/tr.c b/src/tr.c
index f4b5317..ba5054a 100644
--- a/src/tr.c
+++ b/src/tr.c
@@ -347,6 +347,12 @@ only be used in pairs to specify case conversion.
 -s uses SET1 if not\n\
 translating nor deleting; else squeezing uses SET2 and occurs after\n\
 translation or deletion.\n\
 "), stdout);
+     fputs (_("\
+Currently `tr' fully supports only single-byte characters (a notable\n\
+example of multibyte characters that are not supported are UTF-8 and\n\
+UTF-16 encoded Unicode characters)\n\
+"), stdout);
       emit_bug_reporting_address ();
   exit (status);

reply via email to

[Prev in Thread] Current Thread [Next in Thread]