octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #54619] randi() is biased


From: Michael Leitner
Subject: [Octave-bug-tracker] [bug #54619] randi() is biased
Date: Mon, 10 Sep 2018 17:51:20 -0400 (EDT)
User-agent: Mozilla/5.0 (X11; Linux i686; rv:47.0) Gecko/20100101 Firefox/47.0

Follow-up Comment #24, bug #54619 (project octave):

Rik, I have to comment on a number of points in your series of comments.

First, I would say that there are just two main variants of a single algorithm
for deriving unbiased random integers in an arbitrary range from a source of
unbiased random integers in a given range. Both use rejection: For instance,
when you have a generator for integers in [0,65535] (i.e., 16 bit) and you
want to generate integers in [0,9], you first generate a source integer, which
you reject and regenerate iteratively if it should fall within [65530,65535].
So now you have an unbiased integer r within [0,65529], and you get your
output by doing either mod(r,10) or floor(r/6553) (when you are doing integer
arithmetic, you can drop the floor). What is discussed in the arxiv-manuscript
is how you can do that while keeping the number of integer divisions low. So
actually all the implementations have unbounded worst-case behaviour, because
they all use rejection. 

I think you misunderstood my solution. Yes, there is a loop, but it does not
loop over the elements. Instead, I generate first more numbers that I expect
to need, do everything vectorized, and only when at the end I see that the
number of generated numbers is not large enough because I had to reject too
many, I try again. If have the comment "should practically always be true" in
the code, because I generate use an excess number that is 10 times the
expected standard deviation of the number of rejections. In octave,
normcdf(10) evaluates to exactly 1, while 1-normcdf(8) is 1e-15. Thus the
probability for the loop having to be evaluated a second time is probably
about 1e-20. That's what I mean by "practically always". 

Yes, a compiled function will always be faster than a vectorized m-file,
mainly because of the fact that with vectorization, you have to repeatedly
retrieve and store back your intermediate vectors to main memory, while in a
loop you have only to store the final value, everything else happens in
registers. Further, when you directly use randi53(), you obviate the cast to
double and subsequent division that happen in rand(), and you do not have to
undo the division. I think that these are the main reasons why your code is
faster, not so much that the arxiv-algorithms are better (because they are
actually the same).

So how fast is my code? I see


> tic;a = randi(6004799503160661,1000000,1);toc
Elapsed time is 0.245796 seconds.
> tic;a = randi_(6004799503160661,1000000,1);toc
Elapsed time is 0.547995 seconds.
> tic;a = randi_(10,1000000,1);toc
Elapsed time is 0.398885 seconds.


About a factor of 2.2 slower than the simple biased m-file when one in three
values has to be rejected, dropping to a factor 1.6 when practically nothing
is rejected. The slowing-down with rejection cannot be avoided, and a factor
of 1.6 for an m-file is not bad, I think. So as long as it is about addressing
the bias, and as long as nobody finds a bug in my implementation, I still
think it can be used.

Thinking about what to do when you really want to beat Matlab: I think you
meant that rand should keep returning doubles within [0,1). Yes, that goes
without saying. For this you need randi53. But what would be nice to expose
would be something that indeed returns a uint64 with the full range of
[0,2^64-1], because this is in most likelihood what the Mersenne twister
internally returns. And further, for beating Matlab with something "new" I
would propose to change the underlying PRNG to something of the PCG family
http://www.pcg-random.org/, which are both among the fastest and "randomest"
PRNGs available today. I see that Melissa O'Neill even has a very recent post
about just the problem we are discussing here:
http://www.pcg-random.org/posts/bounded-rands.html



    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?54619>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]