Re: uniq Bug

bug-coreutils

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: uniq Bug

From:	Ryan Helinski
Subject:	Re: uniq Bug
Date:	Tue, 27 Jun 2006 21:22:41 -0400
User-agent:	Thunderbird 1.5.0.4 (Windows/20060516)

Thank you, Bob, especially for the URL.

Eric Blake has straightened me out, and this topic doesn't require anyfurther discussion, except making this discrepancy more clear in the manpage as it is clear in the Texinfo page so others don't run into thesame problem.


Thanks again for all your help,

Ryan


Bob Proulx wrote:

Eric Blake wrote:

Not sure if this has already been discovered, but I found a problem withuniq. If I sat down and looked a the code, I could probably see how tofix it. It seems to always occur with very large unsorted streams (files).
Below are the commands I ran to exploit the bug (which I originallythought was my error). Sorting the stream before removing duplicatelines is inconsistent with just removing duplicate lines:
Thanks for the report.  However, uniq only works on sorted streams.
By definition, uniq only looks at consecutive lines, to see if they
are identical.


Strictly speaking the input does not need to be sorted.  You are right
in saying that it only works on adjacent lines.  This is what the
manual has to say about it.

  info coreutils uniq

     The input need not be sorted, but repeated input lines are detected
  only if they are adjacent.  If you want to discard non-adjacent
  duplicate lines, perhaps you want to use `sort -u'.

If the file is not sorted, then the same line might appear twice.  And
changing this would make slow uniq down (either requiring more
memory or more time to keep a list of all previously seen unique lines),
not to mention violating POSIX.


It is left up to the user to sort the file, or not, as desired.

Here is what the standards say:

  http://www.opengroup.org/onlinepubs/009695399/utilities/uniq.html

  Repeated lines in the input shall not be detected if they are not
  adjacent.

I enjoyed the dry humor of this part:

  APPLICATION USAGE
      The sort utility can be used to cause repeated lines to be
      adjacent in the input file.

:-)

Bob

[Prev in Thread]

Current Thread

[Next in Thread]

uniq Bug, Ryan Helinski, 2006/06/27
- Re: uniq Bug, Eric Blake, 2006/06/27
  - Re: uniq Bug, Bob Proulx, 2006/06/27
    - Re: uniq Bug, Ryan Helinski <=

Prev by Date: Re: uniq Bug
Next by Date: Re: i-link-no test triggers HP-UX shell tracing (set -x) problem
Previous by thread: Re: uniq Bug
Next by thread: tests -- file redirection ordering
Index(es):
- Date
- Thread