[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ESPResSo-users] different simulation scenario on desktop or cluster

From: Axel Arnold
Subject: Re: [ESPResSo-users] different simulation scenario on desktop or cluster
Date: Thu, 08 Aug 2013 17:14:11 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130620 Thunderbird/17.0.7

Dear Arash,

do I understand you right that the warmup works even on 12 cores? Since the warmup uses the same integrator and just changes the force capping, that is quite likely to be a problem of the setup, namely after switching off the warmup caps. You might need to output the kinetic energy very often after switching off capping, maybe even every time step, since it usually goes off very fast.

Also, what you can do is to use the Tcl catch command to catch the error, and write out energies and particle positions at the time when they go through the roof.


On 08/08/2013 04:43 PM, Arash Azari wrote:

you very much indeed for the detailed and very useful explanation and
simulation on the cluster with single core runs without any error
messages. Strangely enough! it works with 3 cores as well. I am going
to increase the number of cores on single node up to 12 again.
checked the kinetic energy and it is in a proper range (both the 6
cores desktop and single core cluster outputs are in agreement).
I have used the individual force cap for warmup in this simulation
and I set the cap radius for each interaction. I have tried a very
long warmup as well; something around 1 million steps, but still I get
the same error.
I will
post the latest updates soon after I finished the test simulations.
you very much,


Arash Azari

From: Axel Arnold <address@hidden>
To: Arash Azari <address@hidden>
Cc: "address@hidden" <address@hidden>
Sent: Wednesday, August 7, 2013 11:01 PM
Subject: Re: [ESPResSo-users] different simulation scenario on desktop or 


Bonded interactions are only computed for particles that are
       located on the same CPU. If you increase the number of cores, the
       range over which a bond can be computed, gets shorted. However,
       any reasonable bond is much shorter than your box dimensions, so
       that bond broken errors definitely point to excessive forces. That
       it "works" on your desktop probably simply means that on that
       machine, the long bonds can still be accommodated, but it is very
       likely that your simulation is still aphysical. In particular, I
       doubt that the problem is due to MPI or the machine, but rather a
       problem of your setup. An easy check would be to run just with 6
       cores or even one core on the cluster, just as on your desktop.

To check the physics of your simulation, just write out the
       energies, and check that in particular the kinetic energy
       fluctuates around 1/2 N k_BT, where N is the number of degrees of
       freedom. Although, there should be no other strong energy drift.

Under most circumstances, it is very likely that you actually need
       a warmup phase. However, when combined with walls, capping the
       wall forces is usually not a good idea, since particles are then
       not hindered from penetrating the hard core of the wall
       constraint. Therefore, you should use the individual force cap
       feature, and only set a cap radius for the particle-particle
       interactions, see the User's guide for details on "inter forcecap


On 07.08.13 12:36, Arash Azari wrote:


I have a very strange situation and I cannot find any proper solution for it; I 
highly appreciate any recommendations.
Here is the problem:
I have a simulation system (polymer and ions with repulsive wall) and when I 
run this simulation on my desktop everything is fine (run on single CPU with 6 
cores) regardless of skin parameter (0.4) and the warm up steps; it works even 
with a very short warm up.

When I try to run this simulation on a cluster, a few steps after warm up it 
crashes with the sometimes bond broken error or wall constraint violation 
error. I tried different nodes and cores combinations and even on a single node 
with 2 CPUs (run on 12 cores) it crashes. I have changed the skin parameter up 
to 2.0 and very long warm up and still I get the same error messages a few 
steps after the warm up.
I should mention that I did not cap the interactions between the particles 
(polymer or ion) and the walls during the warm up.
I am not sure whether it is because of the MPI settings on the cluster or not, 
but the cluster administrators are not helpful at all to ask anything about the 
system settings and configuration.
I attached the config.log file if it helps.
you very much,

Best regards,
Arash Azari

JP Dr. Axel Arnold
ICP, Universit├Ąt Stuttgart
Pfaffenwaldring 27
70569 Stuttgart, Germany
Email: address@hidden
Tel: +49 711 685 67609

reply via email to

[Prev in Thread] Current Thread [Next in Thread]