Shuffling elements in a dataset

help-octave

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Shuffling elements in a dataset

From:	John W. Eaton
Subject:	Shuffling elements in a dataset
Date:	Fri, 21 Feb 1997 19:02:35 -0600

[Sorry about posting this more than once, it got away while I was
playing around with the example code and before I was ready to send
it. --jwe]

On 21-Feb-1997, Ted Harding <address@hidden> wrote:

| I find that something like
| 
|    [dummy,ix] = sort(rand(1,rows(x)));  new_x = x(ix,:);
| 
| seems pretty fast. (0.04 secs for 10000 rows, 0.05 secs for 100000 rows,
| or for 1000000, on a 386-DX/25MHz; 0.003 secs for 10000 rows, 0.004 secs
| for 100000 rows, or for 1000000, on Pentium-120, i.e. almost independent
| of number of rows. However for 10000000 rows it starts swapping and takes
| a while (48 MB RAM)). Above timings for 1 column only; reduce sizes pro
| rata for extra columns (RAM limit).

Hmm.  Here is what I get (counting down to minimize memory problems):

  for n = [5e5, 10.^[5, 4, 3, 2, 1]]
    x = rand (n, 1);
    t = cputime ();
    [dummy,ix] = sort(rand(1,rows(x)));
    new_x = x(ix,:);
    t = cputime () - t;
    printf ("N = %8d, t = %6.2f seconds\n", n, t);
    fflush (stdout);
  endfor

  N =   500000, t =  13.50 seconds
  N =   100000, t =   2.31 seconds
  N =    10000, t =   0.18 seconds
  N =     1000, t =   0.02 seconds
  N =      100, t =   0.01 seconds
  N =       10, t =   0.00 seconds

at N ~ 8e5, I get into paging trouble on my 64MB Pentium system.

How did you end up with almost constant time?  I took Octave's sort
algorithm from Knuth.  It's good, but I don't think it's *that* good.

jwe

[Prev in Thread]

Current Thread

[Next in Thread]

Shuffling elements in a dataset, jcardoso, 1997/02/21
- Re: Shuffling elements in a dataset, Marvin Vis, 1997/02/21
- Shuffling elements in a dataset, John W. Eaton <=

Prev by Date: Shuffling elements in a dataset (fwd)
Next by Date: Shuffling elements in a dataset (fwd)
Previous by thread: Re: Shuffling elements in a dataset
Next by thread: Shuffling elements in a dataset (fwd)
Index(es):
- Date
- Thread