[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: tr is handling bytes not characters

From: Jim Meyering
Subject: Re: tr is handling bytes not characters
Date: Tue, 10 Feb 2009 11:59:18 +0100

Nick Demou <address@hidden> wrote:
> On Thu, Feb 5, 2009 at 2:55 PM, Eric Blake <address@hidden> wrote:
>> According to Nick Demou on 2/5/2009 4:20 AM:
>>> And now about the bug report. It's about "tr". I realized that tr was
>>> mostly failing when working on utf-8 input.
>> Thanks for the report.  It is a known problem that coreutils does not yet
>> properly support multi-byte characters (this includes UTF-8), because no
>> one has yet contributed a patch that efficiently supports this without
>> penalizing maintenance or performance of single-byte code paths, while
>> still useful across the wide range of coreutils that need it
> Thanks for the info Eric. I was almost sure this would be the case. In
> fact I don't consider this as the main topic of my bug report. The
> main topic for me is the documentation. The man and info page don't
> make it clear that utf-8 is not supported. I believe that others after
> me will spend a lot of time just to realize that "it's just a missing
> feature".  Do you have any thoughts regarding my suggestions on the
> documentation?

The "real" documentation is in coreutils.texi (generated to
coreutils.info and available via "info coreutils").  There,
under "tr invocation", it already has this caveat:

       Currently `tr' fully supports only single-byte characters.
    Eventually it will support multibyte characters; when it does, the `-C'
    option will cause it to complement the set of characters, whereas `-c'
    will cause it to complement the set of values.  This distinction will
    matter only when some values are not characters, and this is possible
    only in locales using multibyte encodings when the input contains
    encoding errors.

and since "man tr" does point to the authoritative source:

       The  full documentation for tr is maintained as a Texinfo manual.
       If the info and tr programs are properly installed at your  site,
       the command

              info coreutils 'tr invocation'

that may be enough.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]