octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Ziggurat code


From: David Bateman
Subject: Re: Ziggurat code
Date: Mon, 10 Oct 2016 20:10:25 +0200

Thanks for the heads up of your use of the code and the compliment. 

As for the USE_X86_32 define, I believe this is a hang over from Paul Kienzle's 
original Mersenne Twister code. Paul went out of his way to write his code in 
such a way as it could be built stand alone and this define was part of that.

The code seem to have organically developed since both when it has in Source 
Forge and after in the core of Octave. So as you say best just to get rid of 
the define block

D.

> Le 10 oct. 2016 à 01:52, Allin Cottrell <address@hidden> a écrit :
> 
> I'm a developer of the GPL'd econometrics program gretl, and a co-worker and 
> I were recently benchmarking our generation of random normals. We saw that 
> Octave was noticeably faster at this, and I investigated. It transpired that 
> the speed difference was down to David Bateman's very nice implementation of 
> the Marsaglia-Tsang Ziggurat method.
> 
> We had our own implementation but David's was faster, without any sacrifice 
> of quality (good coverage of the real line, passes the
> l'Ecuyer "Crush" test), so we have, gratefully and with attribution,
> "stolen" David's version for gretl.
> 
> Maybe I can give a little something back. There appears to a slight mix-up in 
> the use of *X86_32 macros in the source file in question, namely 
> randmtzig.cc. Near the top of this file there appears the following 
> preprocessor stanza:
> 
> #if ! defined (USE_X86_32)
> #  if defined (i386) || defined (HAVE_X86_32)
> #    define USE_X86_32 1
> #  else
> #    define USE_X86_32 0
> #  endif
> #endif
> 
> Here the symbol HAVE_X86_32 is coming from "elsewhere" (I presume, config.h 
> or the compiler command-line), and the symbol USE_X86_32 is set conditional 
> on that plus the compiler symbol i386. One would suppose, therefore, that the 
> code below should branch conditional on USE_X86_32, but that's not the case: 
> USE_X86_32 is not referenced at all; everything below depends on HAVE_X86_32.
> 
> It seems that either (a) the code below should branch on USE_X86_32 rather 
> than HAVE_X86_32, or (b) if HAVE_X86_32 is reckoned to carry all the relevant 
> information, the aforementioned preprocessor code should be removed, for 
> clarity.
> 
> Related, there's a comment which suggests some uncertainty, "Check whether 
> -DUSE_X86_32=0 is faster on 64-bit x86 architectures." I can say something 
> about that. I've tested the two variants of the Ziggurat code in randmtzig.cc 
> on two 64-bit systems. In each case the integer RNG is version 1.4.1 of SFMT, 
> using SSE2; the only difference is in the Ziggurat code.
> 
> The experiment involved generating 200 million random normals, repeated 5 
> times. In the tables below the columns headed "A" and "B"
> hold timings in seconds for the default 64-bit code-path and the x86 32-bit 
> optimized code-path respectively. The columns to the right of A and B are 
> elapsed-time indices relative to 100 = best on platform.
> 
> Fedora 23 64-bit, core i7-2600 @3.40GHz, gcc 5.3.1
> 
>  A      rel     B      rel
> 3.3994 100.00  3.9320 115.67
> 3.4058 100.19  3.8608 113.57
> 3.4045 100.15  3.9076 114.95
> 3.5408 100.16  3.9141 115.14
> 3.4064 100.21  3.8545 113.39
> 
> Arch 64-bit, core i7-920 @2.67GHz, gcc 6.2.1
> 
>  A      rel     B      rel
> 3.9583 100.00  4.3021 108.69
> 3.9867 100.72  4.3653 110.28
> 3.9781 100.50  4.3308 109.41
> 4.0371 101.99  4.3953 111.04
> 3.9899 100.80  4.4652 112.81
> 
> There's not a huge difference between the Ziggurat variants, but nonetheless 
> it's clear, on this sample, that you don't want to use the 32-bit-optimized 
> code on a 64-bit system. On the more recent
> i7-2600 that incurs a time penalty of about 15 percent; on the older
> i7-920 the penalty is around 10 percent.
> 
> -- 
> Allin Cottrell
> Department of Economics
> Wake Forest University
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]