|
From: | Linda Walsh |
Subject: | bug#16168: uniq mis-handles UTF8 (8bit) characters |
Date: | Mon, 16 Dec 2013 10:02:08 -0800 |
User-agent: | Thunderbird |
Maybe he was hoping for a uniq [-b|--bytes] ? Suggestion to Shlomo (if you use bash): alias uniq='LC_ALL=C \uniq' or, if you want it in your shell scripts too: uniq() { LC_ALL=C; "${type -P uniq}" "$@" ; }; export -f uniq On 12/16/2013 9:33 AM, Pádraig Brady wrote:
tag 16168 notabug close 16168 stop On 12/16/2013 01:50 PM, Shlomo Urbach wrote:Lines with CJK letters are deemed equal by length only, since the characters seem to be ignored. I understand this is due to locale. But, it would be nice if a simple flag would do a locale-free comparison (i.e. equal = all bytes are equal).If you want to compare byte by byte: LC_ALL=C uniq .... thanks, Pǽdraig.
[Prev in Thread] | Current Thread | [Next in Thread] |