octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #54619] randi() is biased


From: Rik
Subject: [Octave-bug-tracker] [bug #54619] randi() is biased
Date: Mon, 10 Sep 2018 13:01:10 -0400 (EDT)
User-agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0

Follow-up Comment #21, bug #54619 (project octave):

For testing, I used a script tst_randi.m which is shown below, and attached.


#a = randi (6004799503160661, 1e6, 1);
#a = randi (9,1e6,1);
#a = randi_rej (6004799503160661);
#a = randi_rej (9);
#a = randi_new (6004799503160661);
a = randi_new (9);

b = a(a>median(a));
c = hist(mod(b,2),[0 1])
c ./ sum (c)


It generates one million values using a particular method and then uses
Michael's test even/odd test.  It prints out both the absolute counts and the
percentages (which should be 50%:50%).

As a warmup, running tst_randi with the existing algorithm (randi) produces


octave:13> tst_randi
c =

   166430   333570

ans =

   0.33286   0.66714


So, there is a clear bias in the results using the existing algorithm.  For
some performance benchmarkings,


octave:14> tic; x = randi (6004799503160661, 1e6, 1); toc
Elapsed time is 0.0255809 seconds.
octave:15> tic; x = randi (6004799503160661, 1e6, 1); toc
Elapsed time is 0.0356832 seconds.
octave:16> tic; x = randi (6004799503160661, 1e6, 1); toc
Elapsed time is 0.0255952 seconds.
octave:17> tic; x = randi (6004799503160661, 1e6, 1); toc
Elapsed time is 0.0247469 seconds.
octave:18> tic; x = randi (6004799503160661, 1e6, 1); toc
Elapsed time is 0.0325689 seconds.


So, roughly, 30 milliseconds.  Using the rejection method,


octave:2> tic; x = randi_rej (6004799503160661, 1e6, 1); toc
Elapsed time is 0.0560811 seconds.
octave:3> tic; x = randi_rej (6004799503160661, 1e6, 1); toc
Elapsed time is 0.0398862 seconds.
octave:4> tic; x = randi_rej (6004799503160661, 1e6, 1); toc
Elapsed time is 0.040313 seconds.
octave:5> tic; x = randi_rej (6004799503160661, 1e6, 1); toc
Elapsed time is 0.040343 seconds.
octave:6> tic; x = randi_rej (6004799503160661, 1e6, 1); toc
Elapsed time is 0.046854 seconds.


Very roughly, 45 milliseconds, or about 50% slower.  Quoting from the paper,


The number of random integers consumed by this algorithm follows a geometric
distribution, with a success probability p = 1 − (2^L mod s)/2^L . On
average, we need 1/p random words. This average is less than two, irrespective
of the value of s. The algorithm always requires the computation of two
remainders.


The worst case is when 2^L mod s is large which is at (2^L / 2) + 1.  For our
case, this is (2^53 / 2) + 1 = 4503599627370497.  Testing this idea proves to
be correct.


octave:2> tic; x = randi_rej (4503599627370497); toc
Elapsed time is 0.0646839 seconds.
octave:3> tic; x = randi_rej (4503599627370497); toc
Elapsed time is 0.102659 seconds.
octave:4> tic; x = randi_rej (4503599627370497); toc
Elapsed time is 0.0820141 seconds.
octave:5> tic; x = randi_rej (4503599627370497); toc
Elapsed time is 0.0970309 seconds.
octave:6> tic; x = randi_rej (4503599627370497); toc
Elapsed time is 0.0697341 seconds.
octave:7> tic; x = randi_rej (4503599627370497); toc
Elapsed time is 0.0705478 seconds.


Maybe this averages 80 milliseconds.  But of course, the real reason to use
thismethod is because it is unbiased.  Running tst_randi shows


octave:8> tst_randi
c =

   250205   249795

ans =

   0.50041   0.49959


Also, it is unlikely that users are choosing pathological values for imax
frequently.



(file #44968)
    _______________________________________________________

Additional Item Attachment:

File name: tst_randi.m                    Size:0 KB


    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?54619>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]