[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#12192: tr - bytes vs characters
From: |
Jim Meyering |
Subject: |
bug#12192: tr - bytes vs characters |
Date: |
Sat, 15 Sep 2012 12:28:54 +0200 |
forcemerge 12192 9365
thanks
Michael Stummvoll wrote:
> Hi gnu folks,
>
> as already known, tr cannot handle multibyte-encodings like utf-8:
>
>> address@hidden:~$ echo "foo" | tr o ö
>> fÃÃ
>
> i know, that multibyte encoding support is not needed for
> posix-compilance, BUT:
>
> the manpage of tr says the following:
>
>> Translate, squeeze, and/or delete characters from standard input,
>> writing to standard output.
>
> and thats the inconsistence imho.
>
> The typical interpretation of "character" in such a context means one
> character on display. regardless which encoding is used or how many
> bytes are used to display this. So, if tr realy translates "characters"
> it should preserve the encoding. If it doesn't do, it does not
> translate "characters" but "bytes". So there I see two ways:
>
> - add multybyte-encoding support to tr
> or
> - change the manpage and helptext to not say "characters" but "bytes"
>
> since it doesn't seem that somebody want to add the support to tr, an
> update of the manpage would be the easier way to ensure the consistence.
Thanks for the report.
I'm merging this issue with the others that relate to tr
and multi-byte support.
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- bug#12192: tr - bytes vs characters,
Jim Meyering <=