Re: [ESPResSo-users] problems on runing parallel Espresso.

espressomd-users

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ESPResSo-users] problems on runing parallel Espresso.

From:	Axel Arnold
Subject:	Re: [ESPResSo-users] problems on runing parallel Espresso.
Date:	Wed, 15 Dec 2010 11:34:54 +0100
User-agent:	KMail/1.10.3 (Linux/2.6.27.54-0.1-default; KDE/4.1.3; x86_64; ; )

On Monday 13 December 2010 11:39:43 Yanping Fan, Liza (Dr) wrote:

> background_errors 0 {079 bond broken between particles 294, 295 and 296
> (particles not stored on the same node)} 6 {079 bond broken between
> particles 225, 226 and 227 (particles not stored on the same node)} 7 {079
> bond broken between particles 331, 332 and 333 (particles not stored on the
> same node)}
>
> The bond broken, particle stored on different node. My simulation box is
> 400A*400A*400A, and all my equilibrium bond lengths are between 20 to 40A.
> I've been suggested to increase the parameter "skin" for my verlet lists,
> (originally I set to 0.5). With 0.5 for skin, it's said there are a quite
> possibility for the two bonded particles to be set up on two different
> processors.

The skin won't help, since it is a tuning parameter. As a side effect, it also 
makes broken bonds less likely, but a bond really only breaks if it is far 
outside its equilibrium distance. The bonded particles are not on the same 
node, which means that they are further apart than 200A, assuming that you 
have 8 processors in use. On a single node that won't cause the simulation to 
fail since all particles are on the processor, but physically it is still 
wrong.

> I increased the "skin" to 30, the above error message disappeared, but
> another error occurred:
> ---------------------------------------------------------------------------
>-- One of the processes started by mpirun has exited with a nonzero exit
> code.  This typically indicates that the process finished in error. If your
> process did not finish in error, be sure to include a "return 0" or
> "exit(0)" in your C code before exiting the application.
>
> PID 13898 failed on node n0 (192.168.2.160) due to signal 11.
>
> If I turned the skin to 30, run it on 2CPU,4CPU,8CPU, they show
> "Segmentation fault" error. Please look at the error message and log file
> attached. I'm thinking of the problem maybe due to espresso distributing
> particles over different nodes, process.

No, the problem is that with such a large skin, you can have only rather small 
real space cutoffs, which means that the electrostatics tries to use too large 
grids to compensate for the real space error, and doesn't get enough memory. A 
skin of at most 5 is a much better idea. Also note that you might request a 
too high precision from P3M, in which case it also uses too much memory.

Many regards,
Axel

-- 
JP Dr. Axel Arnold Tel: +49 711 685 67609
ICP, Universität Stuttgart      Email: address@hidden
Pfaffenwaldring 27
70569 Stuttgart, Germany

[Prev in Thread]

Current Thread

[Next in Thread]

[ESPResSo-users] problems on runing parallel Espresso., Yanping Fan, Liza (Dr), 2010/12/13
- Re: [ESPResSo-users] problems on runing parallel Espresso., Axel Arnold <=
- [ESPResSo-users] problems on runing parallel Espresso., Yanping Fan, Liza (Dr), 2010/12/13
  - Re: [ESPResSo-users] problems on runing parallel Espresso., Olaf Lenz, 2010/12/14

Prev by Date: [ESPResSo-users] parallel espresso run stoped half way no sign of error
Next by Date: [ESPResSo-users] Re: problems on runing parallel Espresso
Previous by thread: [ESPResSo-users] problems on runing parallel Espresso.
Next by thread: [ESPResSo-users] problems on runing parallel Espresso.
Index(es):
- Date
- Thread