[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Shuffling elements in a dataset
From: |
John W. Eaton |
Subject: |
Shuffling elements in a dataset |
Date: |
Fri, 21 Feb 1997 19:02:35 -0600 |
[Sorry about posting this more than once, it got away while I was
playing around with the example code and before I was ready to send
it. --jwe]
On 21-Feb-1997, Ted Harding <address@hidden> wrote:
| I find that something like
|
| [dummy,ix] = sort(rand(1,rows(x))); new_x = x(ix,:);
|
| seems pretty fast. (0.04 secs for 10000 rows, 0.05 secs for 100000 rows,
| or for 1000000, on a 386-DX/25MHz; 0.003 secs for 10000 rows, 0.004 secs
| for 100000 rows, or for 1000000, on Pentium-120, i.e. almost independent
| of number of rows. However for 10000000 rows it starts swapping and takes
| a while (48 MB RAM)). Above timings for 1 column only; reduce sizes pro
| rata for extra columns (RAM limit).
Hmm. Here is what I get (counting down to minimize memory problems):
for n = [5e5, 10.^[5, 4, 3, 2, 1]]
x = rand (n, 1);
t = cputime ();
[dummy,ix] = sort(rand(1,rows(x)));
new_x = x(ix,:);
t = cputime () - t;
printf ("N = %8d, t = %6.2f seconds\n", n, t);
fflush (stdout);
endfor
N = 500000, t = 13.50 seconds
N = 100000, t = 2.31 seconds
N = 10000, t = 0.18 seconds
N = 1000, t = 0.02 seconds
N = 100, t = 0.01 seconds
N = 10, t = 0.00 seconds
at N ~ 8e5, I get into paging trouble on my 64MB Pentium system.
How did you end up with almost constant time? I took Octave's sort
algorithm from Knuth. It's good, but I don't think it's *that* good.
jwe