octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Low hanging fruit - Accelerated random distribution functions


From: David Bateman
Subject: Re: Low hanging fruit - Accelerated random distribution functions
Date: Fri, 23 Feb 2007 14:27:03 +0100
User-agent: Thunderbird 1.5.0.7 (X11/20060921)

Paul Kienzle wrote:
>
> This needs a fast discrete_rnd, which could be implemented
> using lookup as [untested!]
>
> rnd = discrete_rnd(p,v,n)
>     edges = cumsum(p(1:end-1))/sum(p);
>    rnd = v(lookup(edges,rand(n,1))+1);
>
> The current code is approximately:
>
>   edges = cumsum(p)/sum(p);
>   u = rand(1,n);
>   rnd = v (1 + sum( edges*ones(1,n) <= ones(m,1)*u));
>
> Given n numbers generated from m possibilities:
>  current code takes O(m*n) space and O(m*n) time.
>  lookup takes O(m+n) space and O((m+n)*log(m+n)) time.
> so lookup saves both space and time.
>
> Note that the documentation says p is a probability
> whereas it is actually a proportion.

Humm, it seems also that the matlab unid* functions are closely related
to the discrete_* function. The difference is that matlab assumes that
there are equal proportions of each value in the vector v. Matlab also
accepts that v is a scalar and then assumes values of 1:v are in the
distribution. Should we modify discrete_* to have this behavior? Should
we have an extended interface that allows p to be set? Or should we keep
discrete_* functions as is and write matlab compatiable wraps to them
for the unid* functions? Assuming that keep discrete_*, consider the
attached patch. I think the unidrnd.m functions are compatible, but
might have missed some subtly.

I also tested the speed and compliance of your code to the original
distribution and I get good argument and speed like

octave:2> function rnd = discrete_rnd2(v, p, varargin), edges =
cumsum(p(1:end-1))/sum(p);rnd = v(lookup(edges,rand(varargin{:}))+1);
endfunction
octave:3> tic; b2 = discrete_rnd(1:100, [1:50,50:-1:1], 1,1e5); toc
Elapsed time is 0.994423 seconds.
octave:4> tic; b2 = discrete_rnd2(1:100, [1:50,50:-1:1], 1,1e5); toc
Elapsed time is 0.204079 seconds.
octave:5> tic; b2 = discrete_rnd2(1:10, [1:5,5:-1:1], 1,1e5); toc
Elapsed time is 0.202477 seconds.
octave:6> tic; b0 = discrete_rnd(1:10, [1:5,5:-1:1], 1,1e5); toc
Elapsed time is 0.177124 seconds.
octave:7> tic; b2 = discrete_rnd2(1:10, [1:5,5:-1:1], 1,1e6); toc
Elapsed time is 2.180717 seconds.
octave:8> tic; b0 = discrete_rnd(1:10, [1:5,5:-1:1], 1,1e6); toc
Elapsed time is 1.346832 seconds.
octave:9> tic; b2 = discrete_rnd2(1:2, [1,1], 1,1e4); toc
Elapsed time is 0.061501 seconds.
octave:10> tic; b0 = discrete_rnd(1:2, [1,1], 1,1e4); toc
Elapsed time is 0.052948 seconds.
octave:11> tic; b2 = discrete_rnd2(1:2, [1,1], 1,1e5); toc
Elapsed time is 0.204903 seconds.
octave:12> tic; b0 = discrete_rnd(1:2, [1,1], 1,1e5); toc
Elapsed time is 0.109550 seconds.
octave:13> tic; b2 = discrete_rnd2(1:2, [1,1], 1,1e6); toc
Elapsed time is 2.192253 seconds.
octave:14> tic; b0 = discrete_rnd(1:2, [1,1], 1,1e6); toc
Elapsed time is 0.731766 seconds.

So for length(v) large there is a clear win for the new code. For
length(v) ~ 10 and smaller the original code is faster, though not by a
huge amount (only about a factor of 2), given that there is also memory
savings, I'd be inclined to think it is better to go with the new code..

Regards
David

-- 
David Bateman                                address@hidden
Motorola Labs - Paris                        +33 1 69 35 48 04 (Ph) 
Parc Les Algorithmes, Commune de St Aubin    +33 6 72 01 06 33 (Mob) 
91193 Gif-Sur-Yvette FRANCE                  +33 1 69 35 77 01 (Fax) 

The information contained in this communication has been classified as: 

[x] General Business Information 
[ ] Motorola Internal Use Only 
[ ] Motorola Confidential Proprietary

*** ./scripts/statistics/distributions/unidcdf.m.orig47 2007-02-23 
12:46:34.381451380 +0100
--- ./scripts/statistics/distributions/unidcdf.m        2007-02-23 
12:43:13.585784764 +0100
***************
*** 0 ****
--- 1,40 ----
+ ## Copyright (C) 2007  David Bateman
+ ##
+ ## This file is part of Octave.
+ ##
+ ## Octave is free software; you can redistribute it and/or modify it
+ ## under the terms of the GNU General Public License as published by
+ ## the Free Software Foundation; either version 2, or (at your option)
+ ## any later version.
+ ##
+ ## Octave is distributed in the hope that it will be useful, but
+ ## WITHOUT ANY WARRANTY; without even the implied warranty of
+ ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ ## General Public License for more details.
+ ##
+ ## You should have received a copy of the GNU General Public License
+ ## along with Octave; see the file COPYING.  If not, write to the Free
+ ## Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ ## 02110-1301, USA.
+ 
+ ## -*- texinfo -*-
+ ## @deftypefn {Function File} {} unidcdf (@var{x}, @var{v})
+ ## For each element of @var{x}, compute the cumulative distribution
+ ## function (CDF) at @var{x} of a univariate discrete distribution which
+ ## assumes the values in @var{v} with equal probability.
+ ## @end deftypefn
+ 
+ function cdf = unidcdf (x, v)
+ 
+   if (nargin != 2)
+     print_usage ();
+   endif
+ 
+   if (isscalar(v))
+     v = [1:v].';
+   else
+     v = v(:);
+   endif
+ 
+   cdf = discrete_cdf (x, v, ones(size(v)));
+ endfunction
*** ./scripts/statistics/distributions/unidrnd.m.orig47 2007-02-23 
12:46:34.381451380 +0100
--- ./scripts/statistics/distributions/unidrnd.m        2007-02-23 
12:43:04.224263096 +0100
***************
*** 0 ****
--- 1,45 ----
+ ## Copyright (C) 2007  David Bateman
+ ##
+ ## This file is part of Octave.
+ ##
+ ## Octave is free software; you can redistribute it and/or modify it
+ ## under the terms of the GNU General Public License as published by
+ ## the Free Software Foundation; either version 2, or (at your option)
+ ## any later version.
+ ##
+ ## Octave is distributed in the hope that it will be useful, but
+ ## WITHOUT ANY WARRANTY; without even the implied warranty of
+ ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ ## General Public License for more details.
+ ##
+ ## You should have received a copy of the GNU General Public License
+ ## along with Octave; see the file COPYING.  If not, write to the Free
+ ## Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ ## 02110-1301, USA.
+ 
+ ## -*- texinfo -*-
+ ## @deftypefn {Function File} {} unidrnd (@var{v}, @var{r}, @var{c})
+ ## @deftypefnx {Function File} {} unidrnd (@var{v}, @var{sz})
+ ## Generate a row vector containing a random sample of values from
+ ## the univariate distribution which assumes the values in @var{v} with
+ ## eqal probability. If @var{v} is a scalar, it is promoted to 
@code{1:@var{v}}.
+ ##
+ ## If @var{r} and @var{c} are given create a matrix with @var{r} rows and
+ ## @var{c} columns. Or if @var{sz} is a vector, create a matrix of size
+ ## @var{sz}.
+ ## @end deftypefn
+ 
+ function rnd = unidrnd (v, varargin)
+ 
+   if (nargin != 2)
+     print_usage ();
+   endif
+ 
+   if (isscalar(v))
+     v = [1:v].';
+   else
+     v = v(:);
+   endif
+ 
+   rnd = discrete_rnd (v, ones(size(v)), varargin{:});
+ endfunction
*** ./scripts/statistics/distributions/unidpdf.m.orig47 2007-02-23 
12:46:34.381451380 +0100
--- ./scripts/statistics/distributions/unidpdf.m        2007-02-23 
12:43:49.510946234 +0100
***************
*** 0 ****
--- 1,40 ----
+ ## Copyright (C) 2007  David Bateman
+ ##
+ ## This file is part of Octave.
+ ##
+ ## Octave is free software; you can redistribute it and/or modify it
+ ## under the terms of the GNU General Public License as published by
+ ## the Free Software Foundation; either version 2, or (at your option)
+ ## any later version.
+ ##
+ ## Octave is distributed in the hope that it will be useful, but
+ ## WITHOUT ANY WARRANTY; without even the implied warranty of
+ ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ ## General Public License for more details.
+ ##
+ ## You should have received a copy of the GNU General Public License
+ ## along with Octave; see the file COPYING.  If not, write to the Free
+ ## Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ ## 02110-1301, USA.
+ 
+ ## -*- texinfo -*-
+ ## @deftypefn {Function File} {} unidpdf (@var{x}, @var{v})
+ ## For each element of @var{x}, compute the probability density function
+ ## (pDF) at @var{x} of a univariate discrete distribution which assumes
+ ## the values in @var{v} with equal probability.
+ ## @end deftypefn
+ 
+ function pdf = unidpdf (x, v)
+ 
+   if (nargin != 2)
+     print_usage ();
+   endif
+ 
+   if (isscalar(v))
+     v = [1:v].';
+   else
+     v = v(:);
+   endif
+ 
+   pdf = discrete_pdf (x, v, ones(size(v)));
+ endfunction
*** ./scripts/statistics/distributions/unidinv.m.orig47 2007-02-23 
12:46:34.381451380 +0100
--- ./scripts/statistics/distributions/unidinv.m        2007-02-23 
12:43:41.653348752 +0100
***************
*** 0 ****
--- 1,40 ----
+ ## Copyright (C) 2007  David Bateman
+ ##
+ ## This file is part of Octave.
+ ##
+ ## Octave is free software; you can redistribute it and/or modify it
+ ## under the terms of the GNU General Public License as published by
+ ## the Free Software Foundation; either version 2, or (at your option)
+ ## any later version.
+ ##
+ ## Octave is distributed in the hope that it will be useful, but
+ ## WITHOUT ANY WARRANTY; without even the implied warranty of
+ ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ ## General Public License for more details.
+ ##
+ ## You should have received a copy of the GNU General Public License
+ ## along with Octave; see the file COPYING.  If not, write to the Free
+ ## Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ ## 02110-1301, USA.
+ 
+ ## -*- texinfo -*-
+ ## @deftypefn {Function File} {} unidinv (@var{x}, @var{v})
+ ## For each component of @var{x}, compute the quantile (the inverse of
+ ## the CDF) at @var{x} of the univariate distribution which assumes the
+ ## values in @var{v} with equal probability
+ ## @end deftypefn
+ 
+ function inv = unidinv (x, v)
+ 
+   if (nargin != 2)
+     print_usage ();
+   endif
+ 
+   if (isscalar(v))
+     v = [1:v].';
+   else
+     v = v(:);
+   endif
+ 
+   inv = discrete_inv (x, v, ones(size(v)));
+ endfunction
*** ./scripts/statistics/distributions/discrete_rnd.m.orig47    2007-02-23 
12:46:45.663866735 +0100
--- ./scripts/statistics/distributions/discrete_rnd.m   2007-02-23 
13:53:27.589019216 +0100
***************
*** 74,97 ****
      error ("discrete_rnd: p must be a nonzero, nonnegative vector");
    endif
  
!   n = prod (sz);
!   m = length (v);
!   u = rand (1, n);
!   s = reshape (cumsum (p / sum (p)), m, 1);
! 
!   ## The following loop is a space/time tradeoff in favor of space,
!   ## since the dataset may be large.
!   ##
!   ## Vectorized code is:
!   ##
!   rnd = v (1 + sum ((s * ones (1, n)) <= ((ones (m, 1) * u))));
!   rnd = reshape (rnd, sz);
!   ##
!   ## Non-vectorized code is:
!   ##
!   ##  rnd = zeros (sz);
!   ##  for q=1:n
!   ##    rnd (q) = v (sum (s <= u (q)) + 1);
!   ##  endfor
! 
  endfunction
--- 74,78 ----
      error ("discrete_rnd: p must be a nonzero, nonnegative vector");
    endif
  
!   rnd = v (lookup (cumsum (p (1 : end-1)) / sum(p), rand (sz)) + 1); 
  endfunction

reply via email to

[Prev in Thread] Current Thread [Next in Thread]