emacs-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[debbugs-tracker] bug#11968: closed (Bug in "uniq")


From: GNU bug Tracking System
Subject: [debbugs-tracker] bug#11968: closed (Bug in "uniq")
Date: Tue, 17 Jul 2012 21:56:02 +0000

Your message dated Tue, 17 Jul 2012 15:49:38 -0600
with message-id <address@hidden>
and subject line Re: bug#11967: Bug in "uniq"
has caused the debbugs.gnu.org bug report #11967,
regarding Bug in "uniq"
to be marked as done.

(If you believe you have received this mail in error, please contact
address@hidden)


-- 
11967: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=11967
GNU Bug Tracking System
Contact address@hidden with problems
--- Begin Message --- Subject: Bug in "uniq" Date: Tue, 17 Jul 2012 10:18:23 -0800
Dear Sir or Madam,

I think that there is a bug in "uniq" (version 8.13).

The file "bug.txt" attached consists of two lines:
- the first one containing a character that
  looks like a "v" and a line break;
- the second one containing a character that
  looks like a upside down "v" and a line break.
In hex:

    E2 88 A8  0A
    E2 88 A7  0A

When we run "uniq bug.txt" in a terminal, "uniq" outputs a single line, so 
"uniq" thinks
that the two lines are equal, but they are not.

Regards,
Jaime Gaspar
_____________________________
Homepage: www.jaimegaspar.com
E-mail: address@hidden

____________________________________________________________
Send any screenshot to your friends in seconds...
Works in all emails, instant messengers, blogs, forums and social networks.
TRY IM TOOLPACK at http://www.imtoolpack.com/default.aspx?rc=if2 for FREE

Attachment: bug.txt
Description: Text document


--- End Message ---
--- Begin Message --- Subject: Re: bug#11967: Bug in "uniq" Date: Tue, 17 Jul 2012 15:49:38 -0600 User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:13.0) Gecko/20120615 Thunderbird/13.0.1
forcemerge 11967 11968
tag 11967 notabug
thanks

On 07/17/2012 12:17 PM, Jaime Gaspar wrote:
> I think that there is a bug in "uniq" (version 8.13).

Is this your distro's build?  However, I repeated your claim with the
latest coreutils.git (post-8.17)., so this is not likely to be a bug in
a distro-specific multibyte patch.

> 
> The file "bug.txt" attached consists of two lines:
> - the first one containing a character that
>   looks like a "v" and a line break;
> - the second one containing a character that
>   looks like a upside down "v" and a line break.
> In hex:
> 
>     E2 88 A8  0A
>     E2 88 A7  0A

Those glyphs that you describe line up with Unicode characters.  I bet
you are using a locale with UTF-8 character encoding.

> 
> When we run "uniq bug.txt" in a terminal, "uniq" outputs a single line, so 
> "uniq" thinks that the two lines are equal, but they are not.

I can reproduce your symptoms, but only when I fudge my locale:

$ LC_ALL=C uniq ../bug.txt
∨
∧
$ LC_ALL=en_US.UTF-8 uniq ../bug.txt
∨
$

Remember, 'uniq' is required by POSIX to use the same line comparison
techniques as 'sort'; and 'sort' is required to use strcoll() (not
strcmp) to compare lines.  And in your particular choice of locale,
strcoll() happens to state that '∨' and '∧' collate identically; hence
uniq is correct in stating that you have a duplicated line according to
your current locale.

$ LC_ALL=en_US.UTF-8 sort ../bug.txt -u --debug
sort: using ‘en_US.UTF-8’ sorting rules
∨
_
$

So I'm closing this as not a bug, along with a final pointer to our FAQ:

https://www.gnu.org/software/coreutils/faq/#Sort-does-not-sort-in-normal-order_0021

-- 
Eric Blake   address@hidden    +1-919-301-3266
Libvirt virtualization library http://libvirt.org



Attachment: signature.asc
Description: OpenPGP digital signature


--- End Message ---

reply via email to

[Prev in Thread] Current Thread [Next in Thread]