bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: sort -nu: bug or feature?


From: Paul Eggert
Subject: Re: sort -nu: bug or feature?
Date: Wed, 08 Sep 2004 12:33:17 -0700
User-agent: Gnus/5.1006 (Gnus v5.10.6) Emacs/21.3 (gnu/linux)

Andrew Noymer <address@hidden> writes:

> sort -nu seems to compare only the first key, which is not what a lot of 
> people expect.

Thanks for mentioning this.  As far as I can see the behavior in
question is required by POSIX and is the same on non-GNU platforms
(though I checked only Solaris).  I changed the documentation as
follows to try to help clarify things.

2004-09-08  Paul Eggert  <address@hidden>

        * coreutils.texi (sort invocation): Add remarks about sort -u
        versus sort | uniq.  Prompted by a question from Andrew Noymer.

Index: coreutils.texi
===================================================================
RCS file: /home/eggert/coreutils/cu/doc/coreutils.texi,v
retrieving revision 1.205
retrieving revision 1.206
diff -p -u -r1.205 -r1.206
--- coreutils.texi      6 Sep 2004 07:47:04 -0000       1.205
+++ coreutils.texi      8 Sep 2004 19:29:18 -0000       1.206
@@ -3353,7 +3353,7 @@ line as the key and acts as if no orderi
 But if @option{--reverse} (@option{-r}) was specified along with other
 ordering options, then the last-resort comparison does use @option{--reverse}.
 In any case, when no ordering option is specified or when only
address@hidden is specified, the last-resort comparison is not performed
address@hidden is specified, the last-resort comparison is not performed.
 
 @item -S @var{size}
 @itemx address@hidden
@@ -3419,6 +3419,12 @@ Normally, output only the first of a seq
 equal.  For the @option{--check} (@option{-c}) option,
 check that no pair of consecutive lines compares equal.
 
+The commands @code{sort -u} and @code{sort | uniq} are equivalent, but
+this equivalence does not extend to arbitrary @command{sort} options.
+For example, @code{sort -n -u} inspects only the value of the initial
+numeric string when checking for uniqueness, whereas @code{sort -n |
+uniq} inspects the entire line.  @xref{uniq invocation}.
+
 @item -z
 @itemx --zero-terminated
 @opindex -z
@@ -3618,6 +3624,7 @@ lines that are not repeated, or all repe
 The input need not be sorted, but repeated input lines are detected
 only if they are adjacent.  If you want to discard non-adjacent
 duplicate lines, perhaps you want to use @code{sort -u}.
address@hidden invocation}.
 
 @vindex LC_COLLATE
 Comparisons use the character collating sequence specified by the




reply via email to

[Prev in Thread] Current Thread [Next in Thread]