bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#16168: uniq mis-handles UTF8 (8bit) characters


From: Linda Walsh
Subject: bug#16168: uniq mis-handles UTF8 (8bit) characters
Date: Mon, 16 Dec 2013 10:02:08 -0800
User-agent: Thunderbird

Maybe he was hoping for a uniq [-b|--bytes] ?

Suggestion to Shlomo (if you use bash):

  alias uniq='LC_ALL=C \uniq'

or, if you want it in your shell scripts too:

  uniq() { LC_ALL=C; "${type -P uniq}" "$@" ; }; export -f uniq


On 12/16/2013 9:33 AM, Pádraig Brady wrote:
tag 16168 notabug
close 16168
stop

On 12/16/2013 01:50 PM, Shlomo Urbach wrote:
Lines with CJK letters are deemed equal by length only, since the
characters seem to be ignored.
I understand this is due to locale.
But, it would be nice if a simple flag would do a locale-free comparison
(i.e. equal = all bytes are equal).

If you want to compare byte by byte:

LC_ALL=C uniq ....

thanks,
Pǽdraig.








reply via email to

[Prev in Thread] Current Thread [Next in Thread]