octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Low hanging fruit - Accelerated random distribution functions


From: Paul Kienzle
Subject: Re: Low hanging fruit - Accelerated random distribution functions
Date: Fri, 23 Feb 2007 01:06:35 -0500


On Feb 22, 2007, at 7:35 PM, David Bateman wrote:

rande, randg and randp do call the old generators with "seed". However,
that is not enough as the these patch (I give the credit to Paul for the
hint in the randg help text) remove the inverse mapping that was
previously used for a different expression of the random deviate.. I
don't think its possible to have both the speedup and the old sequence..

On the subject of random number generator speed, I
did a comparison of various open source packages,
with the Mersenne Twister as the base generator.

U: uniform
N: standard normal
X: standard exponential
G(a): standard gamma
P(L): poisson

dist    numpy oct*  oct    R*    R     GSL
U(0,1)  0.42  0.30  ----  0.68 ----  0.33
U(-5,5) 0.42  0.48  ----  0.66 ----  0.33
N       0.87  0.38  ----  1.74 ----  1.38
X       0.78  0.31  ----  0.88 ----  0.75
G(15)   1.67  0.76  1.37  1.99 2.17  3.77
G(0.5)  2.09  1.56  2.16  2.84 2.84  3.20
P(2)    1.68  0.43  1.52  0.64 1.50  1.26
P(9)    3.45  0.48  3.34  0.73 2.15  2.49
P(13)   4.31  1.08  5.44  2.06 2.47  4.00
P(45)   3.44  1.14  6.11  1.95 2.37  5.97
P(1e6)  2.65  1.12  6.45  1.86 2.25 23.73
P(1e9)  2.82  ----  1.26  1.86 2.25 35.75

*use constant for distribution parameters rather
than forcing a reset for each parameter.

The conclusion is that we are doing very well,
except that we should be using the poisson code
from R when the lambda parameter changes from
sample to sample.  For the sake of maintainability,
we may want to use the R code even if it is 2x
slower in the oct* case.  I'm working on
converting it.

        - Paul

Notes on generators:

uniform: Octave does not accept limits so uniform(0,1)
        does less work but uniform(-5,5) must be implemented
        as rand(1e6,1)*10-5

normal: Marsaglia's ziggurat from octave is faster
        than Box-Muller

exponential: Marsaglia's ziggurat lookup table is faster
        than -log(1-uniform)

gamma(a>1): Both numpy and octave use Marsaglia's
   algorithm; numpy repeats the sqrt in the initialization
   1e6 times so it is slower than octave*; numpy and octave
   single do the same amount of work, so presumably the
   compiler parameters determine the speed difference (this
   is also seen in Poisson)

gamma(a<1): Octave uses gamma(1+a)*uniform^(1/a) which is
   equal to gamma(1+a)*exp(-exponential/a); with fast gamma
   and fast exponential this is a win.

poisson(L<10): Numpy and octave single both use the direct
   method of repeated multiplication of uniforms.  Octave*
   does a linear lookup in the poisson CDF (large overhead
   but good payoff).  The higher the lambda shape parameter,
   the more numbers needed for repeated multiplication.

poisson(L>12): Octave single uses the Poisson Rejection
   Method from Numerical Recipes.  Numpy uses the
   Transformed Poisson Rejection Method (Hoermann 1992).
   R cites:
   * Ahrens, J.H. and Dieter, U. (1982).  Computer generation of
   * Poisson deviates from modified normal distributions.
   * ACM Trans. Math. Software 8, 163-179.
   Octave repeat uses Zechner's pprsc from Stadloeber's winrand:
   * H. Zechner (1994): Efficient sampling from continuous and
   * discrete unimodal distributions, Doctoral Dissertation,
   * 156 pp., Technical University Graz, Austria.

poisson(L>1e8): Octave switches over to a normal generator
   /* normal approximation: from Phys. Rev. D (1994) v50 p1284 */
   ret = floor(normal*sqrt(L) + L + 0.5);
   if (ret < 0.0) ret = 0; /* will probably never happen */

   The cutoff of L <= 1e8 for the normal approximation is based on:
     > L=1e8; x=floor(linspace(0,2*L,1000));
     > max(abs(normal_pdf(x,L,L)-poisson_pdf(x,L)))
     ans =  1.1376e-28
   For L=1e7, the max is around 1e-9, which is within the step size
   of uniform.  For L>1e10 the pprsc function breaks down.

   Octave repeat is not listed because of a bug; the corrected
   version will be approximately the speed of normal, or about 0.4


Notes on measurements:


numpy 1.0rc1

Timer('import numpy as N; N.random.ZZZ(shape,size=1000000)').timeit(10)/10

for ZZZ in uniform standard_normal standard_exponential standard_gamma poisson

octave* 2.1.73 (octave-forge) repeat

  tic; ZZZ(shape,1000000,1); toc

  for ZZZ in rand randn rande randg randp

octave 2.1.73 (octave-forge)

  L=shape*ones(1000000,1); tic; randZZZ(L); toc

  for ZZZ in randg randp

R* 2.2.1

  gc(); system.time(x<-ZZZ(1000000))[3]

  for ZZZ in runif rnorm rexp rgamma(shape=shape) rpois(lambda=shape)

R 2.2.1

  L<-runif(1000000)/10+shape;
  gc(); system.time(x<-ZZZ(1000000,lambda=L))[3]

  for ZZZ in rgamma(shape=shape) rpois(lambda=shape)

GSL 0.9.0

  #include <gsl/gsl_rng.h>
  #include <gsl/gsl_randist.h>

  int main(int argc, char *argv[])
  {
    const gsl_rng_type *T;
    gsl_rng *r;
    int i;
    double *x;

    gsl_rng_env_setup();
    T = gsl_rng_mt19937;
    r = gsl_rng_alloc (T);
    x = malloc(1000000*sizeof(*x));
    for (i=0; i < 1000000; i++) {
     x[i] = gsl_ran_ZZZ(r,shape);
    }
    return 0;
  }

  for ZZZ in uniform gaussian(1.0) exponential(1.0) gamma(shape,1.0)
        poisson(shape)





reply via email to

[Prev in Thread] Current Thread [Next in Thread]