|
From: | Matthias Sitte |
Subject: | Re: [Help-gsl] Usage of GSL random in parallel code [OpenMP] |
Date: | Mon, 05 May 2014 11:17:49 -0500 |
User-agent: | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:24.0) Gecko/20100101 Thunderbird/24.4.0 |
Hi everybody, a couple of notes from my experience with random numbers and parallel codes:1) You create a lot of GSL RNGs. Why? You only need to create and initialize as many RNGS as you have client threads, and then you re-use them (without re-initialization) for all ensembles.
With N CPUs per node and M threads per CPU (don't use hyperthreading) this totals N*M threads. Each thread itself runs a loop over a subset of DIM_ENSEMBLE/(N*M) ensembles, and that's it.
2) In your implementation, the master thread is allocating all of the GSL RNGs, but that's a 'NUMA sin', because they will be allocated in the memory close to the master thread. The other threads, however, do not have to be as close to that memory as the master thread.
Example: The master thread (#0) is on CPU #1 with RAM banks 1-4, but thread #1 is on CPU #2 with RAM banks 5-8. Then accessing the GSL RNG always entails using some sort of "memory hyperlink" or whatever they call it. That's slowing down the whole RNG. If you'd use MPI, then thread #1 could sit on a entirely different machine, maybe connected by an Infiniband link which is even slower.
So, bottom line: Each thread should allocate and initialize the GSL RNG on its own. In your code, that would mean the allocation must take place within the #omp parallel loop, not outside.
3) You have to properly initialize the GSL RNGs. There are different ways to do that. I like using microseconds obtained from time functions. Beware that if you initialize using only the thread number, then subsequent calls of the same program will always generate the same result. If you'd use some time function, there would be some "random" element to it, so you're (much) less likely to generate the same result running the program twice.
Furthermore, if you use MPI you could also let the master thread determine a "master seed", e.g. through environment variable), and then pass it on to the clients through MPI_Bcast, and each client could use a seed based on the master seed.
Hope that helps. Matthias Sitte On 5/4/14, 8:14 AM, Klaus Huthmacher wrote:
Dear Altro, I would simply use a wrapper around a C++ std::vector who holds as many C++11 random number generators (engines) as you use threads in your code. By calling a random number, the engine in the ith index of the vector is used, where i is the thread id. Please see the appended minimal example mb.cpp with the wrapper rng-omp.{h,cpp}. Best wishes and keep us informed, -- Klaus.Dear All I wish to use effectively use the gsl random number generator jointly with OpenMP on a HPC cluster. The simulation I have to do are quite simple: just let evolve an "ensemble" of system with the same dynamics. Parallelize such a code with OpenMP "seems" straightforward. As the GSL does not support parallel processing, I suppose that we must use a different random number generator for each element of the ensemble; Indeed I don't know if this is a good procedure. The ensemble could be very big and it would be simply impossible to allocate memory for all generators associated with the Systems of the ensemble. Then my question is: Is this still a correct way to use the GSL random generator on OpenMP? An idea of how the allocation could be done is given in the code below. Does it make any sense? If not, which is the correct way? Best wishes, Al. #include <stdlib.h> #include <stdio.h> #include <gsl/gsl_rng.h> #include <gsl/gsl_randist.h> #include <omp.h> int main (){ int i; int DIM_ENSEMBLE = 10000000000; double *d; gsl_rng **r ; d = malloc(DIM_ENSEMBLE*sizeof(double)); r = malloc(DIM_ENSEMBLE*sizeof(gsl_rng*)) ; ; for (i=0;i<DIM_ENSEMBLE;i++){ r[i] = gsl_rng_alloc (gsl_rng_mt19937); // r[i] = gsl_rng_alloc (gsl_rng_taus); gsl_rng_set(r[i],i); } size_t n = gsl_rng_size (r[0]); //size in byte (the size of tausworthe is 24 ; the size of mersenne twister is 5000) #pragma omp parallel for default(none) shared(r,d) shared(DIM_ENSEMBLE) for (i =0; i< DIM_ENSEMBLE; i++) { d[i] = gsl_rng_uniform(r[i]); } #pragma omp parallel for default(none) shared(d) shared(DIM_ENSEMBLE) for (i=0;i<DIM_ENSEMBLE;i++){ printf("%d %f\n",i,d[i]); } printf("The size in Mb of the vector r is %lu Mb\n",n*DIM_ENSEMBLE/1000000); free(d); for (i=0;i<DIM_ENSEMBLE;i++){ gsl_rng_free (r[i]); } free(r); return 0; }
[Prev in Thread] | Current Thread | [Next in Thread] |