[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Mpi and cuda

From: Jean-Noël Grad
Subject: Re: Mpi and cuda
Date: Wed, 23 Sep 2020 17:45:52 +0200
User-agent: Mozilla/5.0 (X11; Linux i686; rv:68.0) Gecko/20100101 Thunderbird/68.12.0

Dear Martin,

I think this memory allocation on worker nodes is a bug. Only the head node communicates with the GPU during a simulation. I've opened a ticket in our issue tracker. You can see the discussion and progress here:

Thanks for reporting this issue to us.

Jean-Noël Grad

On 9/23/20 3:02 PM, Martin Kaiser wrote:
Dear Rudolf,

Do I understand correctly that I should check whether it is worth to use MPI to 
speed up my integration? If so, then on 4 cores I get a speed up of around a 
factor of 2, just from a simple measurement of using LB_GPU and MPI at the same 
time, so it would be worth for me to make that work.
I will look for a rather simple solution to free up the memory of the GPU if 
the job is done or breaks, that’s for now my only concern, as simulation 
results seem fine otherwise.
Thanks for the answer!

Best, Martin

On 22.09.2020, at 16:05, Rudolf Weeber <> wrote:

Hi Martin,

all the GPU stuff runs from the head node. The other nodes probably still load 
the driver, so that's why you see them in the profile.
The GPU work overlaps with the CPU work in time, but some extra communication 
is needed to gather the full system on the head node and send it to the GPU. 
Before using LB GPU with MPI parallel simulation, it might be worthwhile to put 
timings around the integration

import time

tick =time.time()
tock = time.time()
print("Time per step (s):",(tock-tick)/steps)

Regards, Rudolf

On Tue, Sep 22, 2020 at 03:31:33PM +0200, Martin Kaiser wrote:
Hello everybody,

I have a technical question about using the open MPI and CUDA implementations 
at the same time.
If I start my GPU accelerated espresso script in MPI, with the standard command 
like this:

mpirun -n 4 espresso;

then 4 instances of the same job are started on my GPU, of which only one is 
actually doing some work on the GPU. If I monitor the usage with "nvidia-smi”, 
I get something like this:

GPU   GI   CI        PID   Type   Process name                  GPU Memory
1   N/A  N/A     26365      C   /usr/bin/python3                  207MiB
1   N/A  N/A     26366      C   /usr/bin/python3                  129MiB
1   N/A  N/A     26367      C   /usr/bin/python3                  129MiB
1   N/A  N/A     26368      C   /usr/bin/python3                  129MiB

Additionally, if I kill this job, not all of the instances on the GPU are 
aborted, meaning that it is not freeing the memory on the card.
Is there something I am doing wrong with how I compile or call Espresso? Or is 
it that the MPI implementation is not “aware of cuda” and instancing copies of 
the same job on the GPU.

Thanks for the help,

Dr. Rudolf Weeber
Institute for Computational Physics
Universität Stuttgart
Allmandring 3
70569 Stuttgart
Phone: +49(0)711/685-67717

reply via email to

[Prev in Thread] Current Thread [Next in Thread]