[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: new set functions

From: Paul Kienzle
Subject: Re: new set functions
Date: Wed, 15 Nov 2000 20:50:22 +0000
User-agent: Mutt/1.2.5i

The run-time complexity of the set functions should be as follows:

        create_set  =  O (n log(n)) since you have to look up
                each new element in the existing set before
                inserting it, and lookup is O(log n).  So the
                sort algorithm is a fine way to handle this.

        union = O(n+m) since in the worst case, you need to
                copy both sets.

        intersection, complement = O(max(n,m)) if the set
                sizes are similar, or O(m log(n)) if m
                is much smaller than n, since in the worst
                case you need to copy the whole set (m~n)
                or just look up a few elements in the
                larger set (m<<n).

Octave 2.1.31 uses an O(n log n) algorithm for create_set and union, and
an O(n) interpreted loop for intersection, complement.  The CVS archive
has an O(n log n) algorithm for intersection and complement as well,
which is faster than the O(n) algorithm only because for loops are so
slow.  If you are concerned about speed for very large sets, it is worth
recoding union, intersect, and complement in C++ using the O(n) algorithm,
otherwise get the most recent scripts from the archives.

Paul Kienzle

On Wed, Nov 15, 2000 at 09:52:44AM -0500, Lippert, Ross A. wrote:
> I have made a modification to create_set because I found myself
> often wanting not just the set of unique elements but also the
> counts of those elements in the original input.  Perhaps that is
> just my own eccentricity.
> However, if one modifies create_set in this way (returning a
> second output of counts), then I believe that the runtime complexity
> of intersection and complement can be reduced to n log n instead of
> n^2 (which they appear to be now, but I am no expert on complexity).
> Anyhow, as a quick example I tried 
> x = fix(20000*rand(1000,1)); y = fix(20000*rand(1000,1));
> and found a big difference between the timings of
> intersection(x,y), so I think I am on the right track.
> I'd be sending this out in the form of a patch, but for the moment,
> I'd really like feedback to tell me if I have done something stupid
> so I don't waste time updating my CVS, installing the new scripts 
> and then diffing all to find out I missed something.
> -r

reply via email to

[Prev in Thread] Current Thread [Next in Thread]