bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#16168: uniq mis-handles UTF8 (8bit) characters


From: Shlomo Urbach
Subject: bug#16168: uniq mis-handles UTF8 (8bit) characters
Date: Mon, 16 Dec 2013 22:19:31 +0200

Thanks,

this works great.
But, I'm sure the general public doesn't know of this issue.

Shlomo


On Mon, Dec 16, 2013 at 8:02 PM, Linda Walsh <address@hidden> wrote:

> Maybe he was hoping for a uniq [-b|--bytes] ?
>
> Suggestion to Shlomo (if you use bash):
>
>   alias uniq='LC_ALL=C \uniq'
>
> or, if you want it in your shell scripts too:
>
>   uniq() { LC_ALL=C; "${type -P uniq}" "$@" ; }; export -f uniq
>
>
>
> On 12/16/2013 9:33 AM, Pádraig Brady wrote:
>
>> tag 16168 notabug
>> close 16168
>> stop
>>
>> On 12/16/2013 01:50 PM, Shlomo Urbach wrote:
>>
>>> Lines with CJK letters are deemed equal by length only, since the
>>> characters seem to be ignored.
>>> I understand this is due to locale.
>>> But, it would be nice if a simple flag would do a locale-free comparison
>>> (i.e. equal = all bytes are equal).
>>>
>>
>> If you want to compare byte by byte:
>>
>> LC_ALL=C uniq ....
>>
>> thanks,
>> Pǽdraig.
>>
>>
>>
>>


reply via email to

[Prev in Thread] Current Thread [Next in Thread]