espressomd-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ESPResSo-devel] Cluster Hardware Requirements


From: Axel Arnold
Subject: Re: [ESPResSo-devel] Cluster Hardware Requirements
Date: Wed, 11 Jul 2012 17:26:46 +0200
User-agent: KMail/4.7.2 (Linux/3.1.10-1.9-default; KDE/4.7.2; x86_64; ; )

On Tuesday 10 July 2012 17:11:22 Mingyang Hu wrote:
> Dear all,
> 
> I hope to get some suggestion from some of you who have experience in
> running Espresso on different types of hardwares.
> 
> The new AMD Opterons (12 or 16 core/chip) seem to have a very good balance
> between number of cores and the amount of cache. Has anyone who's using
> this CPU ever met some problem when running Espresso or other MD programs?
> How's the scalability of Espresso running on ~50ish cores?

Hi!

If the interconnect is reasonably good, Espresso scales weakly quite ok with 
about 2000 particles/core. On BlueGene/P, e.g., with a very good interconnect 
but slow processors, simulations could go down to 500 particles/core. On a 
Cray XE6, someone at our institute was running simple polymer melts with 128 
processors and 1000 particles/core.

> Also, I have a more general question regarding the cache as a guidance for
> future simulations. As far as I understand, one advantage of doing MD
> simulation is that during calculation, the amount of information stored on
> a local processor is moderate so that one can possibly fit the data into
> the cache of the processors. So I wonder how much cache do we normally need
> for typical situations where we simulate 1-100 thousands of particles (e.g.
> to hold the r,v,f, lists and so on)? How many particles can 1MB/core L2+L3
> cache support?

It depends a bit on how many features are switched on, rotation e.g. costs 32 
byte just for the quaternions. With just a few standard features, a particle 
has around 140 byte (p+v+f = 9*8 =72, old position, type, id, charge, ...). 
That means that 1MB cache is good for about 5000 particles, since also the 
cell structures and other infrastructure should be cached. On the other hand, 
the loops are organized such a particle typically only needs to be loaded once 
per time step at most, so even there is no dramatic performance drop when you 
have more particles than cache.

The bigger concern is actually, that if there are too few particles on a core, 
that means that the ghost frames become quite big, causing a lot of 
communication overhead. That is why it is more problematic to have too few 
particles per core compared to having too many.

Axel

-- 
JP Dr. Axel Arnold
ICP, Universität Stuttgart
Pfaffenwaldring 27
70569 Stuttgart, Germany
Email: address@hidden
Tel: +49 711 685 67609



reply via email to

[Prev in Thread] Current Thread [Next in Thread]