bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: tr doesn't support multibyte characters


From: Paul Eggert
Subject: Re: tr doesn't support multibyte characters
Date: Wed, 14 Sep 2005 15:21:54 -0700
User-agent: Gnus/5.1007 (Gnus v5.10.7) Emacs/21.4 (gnu/linux)

Egmont Koblinger <address@hidden> writes:

> I guess tr should support multibyte character sets, even if not by default,
> then by providing a command line option.

That'd be nice.  It's a bit tricky, though.  Doing it right would
require that tr support encoding errors (stray byte sequences that
cannot be parsed as parts of multibyte characters).  For example, one
should easily be able to remove the encoding errors without making any
other changes, or to transliterate to upper-case while preserving
encoding errors.  Help in this area would be appreciated.

The POSIX spec for tr
<http://www.opengroup.org/onlinepubs/009695399/utilities/tr.html>
talks about this issue somewhat, but it's incoherent -- I can't make
heads or tails of what the -C option is really supposed to do.

> If I'm wrong and the current behavior is the desired one then please replace
> all occurances of "character" to "byte" in its manual.

The CVS version of the coreutils manual talks about this, saying
"Currently @command{tr} fully supports only single-byte characters.
Eventually it will support multibyte characters; ..." with some more
details about the problem.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]