[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: uniq without the need of sort

From: Pádraig Brady
Subject: Re: uniq without the need of sort
Date: Tue, 08 Nov 2011 17:37:44 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:6.0) Gecko/20110816 Thunderbird/6.0

On 11/08/2011 04:30 PM, Peng Yu wrote:
> Hi,
> 'uniq' currently relies on 'sort'. When the input file is small, this
> is OK. But when the input file is large, this seems to be a waste (the
> complexity is O(n log(n)), if uniq handles a hash table its self the
> complexity is only O(n)). I'm wondering if it is better to relax the
> requirement of 'sort' when 'uniq' is used.

Well uniq with a hash, would not be O(n) unless a "pefect hash"was used,
which is impossible with arbitrary inputs.
So in the worst case it could go to O(n^2).
Also uniq would have to worry about memory for the hash,
with temp files for scalability etc.

Currently sort implements all the complexity and
the other tools take advantage of this.
Note sort -u is available as that can discard
duplicates as encountered to reduce the sort complexity.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]