[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: benchmarks - sort
From: |
taltman |
Subject: |
Re: benchmarks - sort |
Date: |
Wed, 21 Jan 2004 22:51:10 +0000 (UTC) |
The Scheme Library has a nice solution for merge-sort functionality:
SLIB
http://www.swiss.ai.mit.edu/~jaffer/slib_7.html#SEC199
You use it like this:
guile> (sort (list 3 3 8 1 9 4) <)
(1 3 3 4 8 9)
So, 'sort' is a function that accepts two arguments: the first one is
the list ( or, in Octave's case, a one-dimensional array/cell-array ),
and a function which tells the sorting function how to compare two
objects.
By passing the comparison function to the sort algorithm, it allows
sort to be generic; any data structure that an end-user might have to
work with can be sorted, provided that they make a simple predicate
function that can compare any two of them.
I imagine that this can be accomplished in Octave via function
handles, no?
Just my $0.02,
~Tomer
On Jan 21, 2004 at 11:09pm, David Bateman wrote:
David. >Date: Wed, 21 Jan 2004 23:09:52 +0100
David. >From: David Bateman <address@hidden>
David. >To: THOMAS Paul Richard <address@hidden>
David. >Cc: address@hidden
David. >Subject: Re: benchmarks - sort
David. >Resent-Date: Wed, 21 Jan 2004 16:14:17 -0600
David. >Resent-From: address@hidden
David. >
David. >Hi Thomas,
David. >
David. >Well the fact is I've come a bit further in testing some sorting
David. >algorithms. I've tried several different cases. Firstly, what Alois
David. >said about sorting IEEE754 as integers numbers made me starting
David. >looking at radix sorting. Then Paul Kienzle strongly suggested the
David. >Python mergesort algorithm as having great abilities especially for
David. >partially ordered lists. I've taken the Python code and implemented it
David. >as a class and use it to sort either directly on the doubles or
David. >recasting as uint64 and use the same trick I used for the radix
David. >sort.
David. >
David. >One problem I have with the Python code, is that when calling
David. >"[b,bi] = sort(a)", I have to create a new class containing the element
and
David. >the index like
David. >
David. >template <class T>
David. >class
David. >vec_index
David. >{
David. > public:
David. > vec_index (void) : vec (0), indx (-1) { }
David. > vec_index (T v, int i) : vec (v), indx (i) { }
David. > vec_index<T> &operator = (const vec_index<T> &x);
David. > T vec;
David. > int indx;
David. >};
David. >
David. >template vec_index<double>;
David. >
David. >bool
David. >operator < (const vec_index<double> &a, const vec_index<double> &b)
David. >{
David. > return (xisnan(b.vec) || (a.vec < b.vec));
David. >}
David. >
David. >This is due to the fact that the calls in the class to sort the
David. >elements are of the form "tmp = *a; *a = *b; *b = tmp;". This imposes
David. >quite a penalty on the mergesort. I can really see another way to do
David. >this without writing specific versions of the code for all
David. >instantiations of the mergesort class for with and without indexing
David. >(Ugly)... Any suggestions are welcome.
David. >
David. >In any case there are 5 test case, with and without indexing
David. >
David. >1a) Matlab R12 "b = sort(a)" - baseline for unindexed octave tests
David. >1b) Matlab R12 "[b,bi] = sort(a)" - baseline for indexed octave tests
David. >2a) Octave "b = sort(a)" - orginal octave code
David. >2b) Octave "[b,bi] = sort(a)" - orginal octave code with indexing
David. >3a) Octave "b = radix(a)" - Radix sort test code
David. >3b) Octave "[b,bi] = radix(a)" - Radix sort test code with indexing
David. >4a) Octave "b = merge_double(a)" - Mergesort on doubles
David. >4b) Octave "[b,bi] = merge_double(a)" - Mergesort on doubles with
indexing
David. >5a) Octave "b = merge_ieee7543(a)" - Mergesort on IEEE754 case as uint64
David. >5b) Octave "[b,bi] = merge_double(a)" - Mergesort on IEEE754 case as
uint64
David. > with indexing
David. >
David. >I've benchmarked the matlab code seperately (in "matlab.log") and used
David. >this to benchmark the other four versions of the code, with and without
David. >returning the index array. All tests were run on the same machine under
David. >the same conditions. I also trying to do a bit of averaging in the tests
David. >to remove variation in the runtimes. I attach the code I used in the
tarball
David. >in this mail.
David. >
David. >The names of the tests I used are the same as those in the Python code
and
David. >are
David. >
David. >*sort Randomly distributed lists
David. >\sort Descending list
David. >/sort Ascending list
David. >3sort Ascending list with 3 values randomly interchanged
David. >+sort Ascending list with the last 10 values replace with random
data
David. >=sort All values equal
David. >
David. >There are two tables in "octave.log" for each test. The first table is
the
David. >runtime of each iteration, while the second table is the relative time
wrt
David. >Matlab. So as to not bias the results against Octave, with its problems
David. >with "for" loops, I've assumed that
David. >
David. > a = randn(n,rep)
David. > t = cputime;
David. > b = sort(a);
David. > time (cputime - t) / rep;
David. >
David. >gives the runtime of "b = sort(a)", where "a = randn(n,1)".
David. >
David. >The basic result is that the radix sort compares well against Matlab
David. >for randomly distributed values (1.10 times the speed of Matlab), but
David. >as this sort takes no advantage of partial ordering, it breaks down in
David. >these cases. The mergesort with double values is faster than Matlab
David. >for partailly ordered lists, but about 2.5 to 3 times slower than
David. >Matlab for random lists. Interestingly, although merge_double is
David. >significantly slower with indexing, it maintains the same relationship
David. >with Matlab.
David. >
David. >The clear winner for me is the mergesort with integer comparison of
David. >IEEE754 values. This is roughly two times faster than Matlab for
David. >partially ordered lists, while being only slightly slower (1.29
David. >relative to Matlab) than the radix sort (1.10 relative to Matlab).
David. >This algorithm is roughly 6 times faster than the current Octave
David. >sort code. The performance degrades slightly with indexing (roughly
David. >2.0 relative to Matlab).
David. >
David. >Note that my test code, checked the correctness of the sorting
David. >algorithms, that it sorted NaN, Inf and -Inf correctly and that
David. >it was stable.
David. >
David. >There are several questions to answer before this might be rewritten
David. >for inclusion either in octave or octave-forge.
David. >
David. >* Does octave support any platform with 8-byte integers? If so the
David. > mergesort class could be used with doubles in this case. If not I
David. > can simplify the code. In any case having mergesort as a class
David. > means that other parts of octave might use this class (sort char
David. > matrices for instance).
David. >
David. >* Is there any other way to win back the speed loss I get for sorting
David. > with indexing with the mergesort class? Maybe someone could look at
David. > merge.cc and oct-sort.cc and give their thoughts?
David. >
David. >* Is there any chance of this going into octave, or should I be
targeting
David. > octave-forge?
David. >
David. >* Should I bother implementing complex sorting (done it for radix sort
David. > but not in the attached tar-ball)?
David. >
David. >Cheers
David. >David
David. >
David. >Daprès THOMAS Paul Richard <address@hidden> (le 21/01/2004):
David. >> For what it is worth ( 0.02 euros?), in the course of educating
myself about
David. >> the C++ standard library, I did a test of the sorting capability of
David. >> multimaps. It turns out to be a bit off the performance discussion,
in that
David. >> the timing is EXACTLY the same as that of octave's sort and about a
factor
David. >> of five slower than Matlab R13 on the same machine (not very
surprising,
David. >> since the standard library uses exactly the same algorithm as
octave!).
David. >> However, it is interesting just how concise a routine, exploiting the
David. >> standard library can become. Please find the routine below.
David. >>
David. >> Paul Thomas
David. >>
David. >> //Test of auto-sorting properties of standard library multi-map
David. >> //In input vector x is inserted element by element into a
David. >> //multi-map vmap, together with the input index of the element.
David. >> //Since multi-map insert sorts according to the value of the first
David. >> //element, reading the map pairs back, using the multi-map iterator,
David. >> //yields the sorted values and the sorted index.
David. >> //
David. >> //Paul Thomas 17/01/04
David. >>
David. >> #include <octave/oct.h>
David. >> #include <octave/variables.h>
David. >> #include <map>
David. >> using namespace std;
David. >>
David. >> DEFUN_DLD (mysort, args, ,
David. >> "mysort. Call using \
David. >> [a,b]=mysort(x)")
David. >> {
David. >> typedef multimap<double,double> ddmap; //<value,index>
David. >> ColumnVector vin(args(0).vector_value()); //turn input into
David. >> ColumnVector
David. >> int vinlen=vin.length(); //number of input
elements
David. >> ddmap vmap; //multimap container
for x
David. >> and its index
David. >> ddmap::iterator pos; //iterator for the
multimap
David. >> double didx=1; //double index
David. >> for (int idx=0;idx<vinlen;idx++) //let multimap insert
sort
David. >> by value
David. >> {
David. >> vmap.insert(make_pair(vin(idx),didx++)); //insert value,index
David. >> }
David. >> int idx=0; //output counter
David. >> ColumnVector idxout(vinlen); //sorted index
David. >> for (pos=vmap.begin();pos!=vmap.end();pos++,idx++)
David. >> {
David. >> vin(idx)=pos->first; //sorted value
David. >> idxout(idx)=pos->second; //sorted index
David. >> }
David. >> octave_value_list retval(vin); //sorted value to
output
David. >> retval.append(octave_value(idxout)); //sorted index to
output
David. >> return retval; //return
David. >> }
David. >>
David. >>
David. >> ---
David. >> Outgoing mail is certified Virus Free.
David. >> Checked by AVG anti-virus system (http://www.grisoft.com).
David. >> Version: 6.0.564 / Virus Database: 356 - Release Date: 19/01/04
David. >>
David. >
David. >
sort.tgz
Description: Binary data