[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Future of GPU support in Espresso

From: Rudolf Weeber
Subject: Future of GPU support in Espresso
Date: Tue, 12 May 2020 10:32:56 +0200
User-agent: Mutt/1.9.4 (2018-02-28)

Dear Espresso users,

when taking stock of how we, the Espresso core team, spend our development 
time, it became clear that maintaining the GPU support in Espresso makes up a 
disproportionately large fraction.
Last month alone, the core team spent more than 50 hours on dealing with new 
linux distributions, compiler versions and library dependencies related to GPU 
support. It was all build and test infrastructure, no improvement in terms of 
functionality or performance was achieved for Espresso.
Not for the first time, we asked ourselves, whether we should drop GPU support 
from Espresso and spend the development time on other aspects related to 
Espresso, where we see a larger benefit.
Before making a decision, we would like to discuss this with you, the users.

The affected methods for which a CPU alternative would have to be used are:
* GPU lattice Boltzmann and electrokinetics
* GPU charge P3M method
* GPU dipolar Barnes-Hut and direct summation method

To help us make an informed decision, please let us know, if you are currently 
using any of these methods and roughly what kind of systems you are looking at:
* number of particles
* volume fraction
* active methods (electrostatics, magnetostatics, lattice Boltzmann, 
electrokinetics, virtual sites, ...)
* how many time steps in a simulation
* how many simulations
* what is the relative importance of time to solution for a single simulation 
compared to the entire bunch of simulations in a project to you? (Note: 
Dropping GPU support will likely increase the time to solution of a single 
simulation. On the other hand, compute time on GPUs is often not as readily 
available as compute time on pure CPU systems. It may therefore be possible to 
run more simulations in parallel if GPUs are not required.)
* what GPUs do you have access to, and how many?

We hope that gathering answer to these questions will let us figure out, how to 
Below, please find some more technical notes.

Regards, Rudolf

Details on the high maintenance effort of GPU support:
* GPUs are not readily available in public continuous integration testing 
services. Therefore, GPU-testing has to be performed on infrastructure we 
operate ourselves, both, in terms of hardware and software.
* Nvidia places a lot of restrictions on which version of their software is to 
be used with which compiler GCC and Clang compiler version
* There are subtle differences of opinion on correct C++ between the components 
* NVidias compiler requirements are not necessarily the default typically 
installed with linux distros such as Ubuntu.
* Several of these issues have to be dealt with every time a new Ubuntu version 
is released. (We use Ubuntu for testing GPU support).

Notes on lattice Boltzmann (LB):
* The GPU LB, along with the other GPu methods, are single precision. This 
limits their accuracy. E.g., mass is not exactly conserved in our GPU LB, due 
to rounding issues. It is unclear, how big an issue that is.
* The CPU LB implementation uses double precision.
* Switching the GPU LB to double precision would render it mostly unusable on 
cheaper (<=500 Euro) gaming cards, as they are often used in desktops. These 
have very poor double precision performance. Cards with good double precision 
performance cost 5-10 times that amount.
* The time to solution for the CPU LB (using double precision compared to 
single precisoin on GPU) is currently 2x-3x that of the GPU LB for a 
Lennard-Jones+LB system with 10% volume fraction and an LB lattice constant 
comparable to the Lennard-Jones sigma. This ratio is expected to improve as we 
switch from our custom LB implementation to that provided by the Walberla 
* The hardware configuration of modern compute clusters with GPUs is not well 
suited for Espresso simulations with GPU LB. The ratio of CPU core to GPU is 
typically 10:1 to 20:1. For systems with less than 100k particles, Espresso 
will neither use the GPU nor the CPU cores efficiently. 
* Compute capacity without GPUs is much more readily available. It is also 
cheaper, unless one can fully load the GPU, which is typically not the case for 
soft matter simulations.

Notes on electrokinetics:
* Currently, only a single precision GPU implementation is available in Espresso
* Independently of the decision on GPU support, lattice Boltzmann and 
electrokinetics will be provided by the Walberla package in the future. In a 
first step, this will be a well-optimized CPU version in double precision.

Notes on electrostatics:
The CPU-based P3M method can be used instead of the GPU-based one.

Notes on magnetostatics:
* Due to the 1/r^3 decay and the random summation order, the use of single 
precision in the GPU code is a relevant limitation to accuracy.
* The double-precision dipolar direct summation will be MPI-parallelized on the 
CPU, allowing for better time to solution for larger systems.
* For systems with more than 10k particles, the dipolar P2NFFT method from the 
ScaFaCoS library can be used for systems with open boundaries.

Dr. Rudolf Weeber
Institute for Computational Physics
Universit├Ąt Stuttgart
Allmandring 3
70569 Stuttgart
Phone: +49(0)711/685-67717
Email: address@hidden

reply via email to

[Prev in Thread] Current Thread [Next in Thread]