Re: Question about checkpoint

espressomd-users

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Question about checkpoint

From:	Jean-Noël Grad
Subject:	Re: Question about checkpoint
Date:	Thu, 09 Dec 2021 17:13:32 +0100
User-agent:	Roundcube Webmail/1.3.17

Dear Lili Zeng,

Thank you for sharing your observations. The checkpointing mechanismdoesn't restore all global variables. Some of them are regenerated fromthe state of other global variables. For example, the cell structureglobal variable stores raw pointers to the particles, but these pointersare meaningless when reloading from a checkpoint, and so the cellstructure global has to be regenerated by placing "new" particles in anempty simulation box based on the particles positions stored in thecheckpoint file. This causes unintended side-effects when reloading asimulation from a checkpoint, because a cell resort is triggered whenparticles are placed in the simulation box. In particular, cached valuesof the integrator are invalidated. This can lead to an error in thecalculated forces in the order of 1e-4. If the time step is 1e-2, theparticle positions will be off by 1e-8 at the next integration step andthe trajectory will diverge.

Please find attached a minimal working example to replicate thisbehavior. It simulates a single particle for 4 time steps using theLangevin thermostat, with a "perturbation" at the third time step. Thisperturbation can be a reload from a checkpoint (--perturbation load), acell resort (--perturbation resort) or an invalidation of the cachedintegrator values (--perturbation recalc). The reference trajectory isobtained without perturbation (--perturbation continue), which alsosaves a checkpoint before the third time step. Here are the results forESPResSo 4.2-dev:


$ ./pypresso checkpoint_mwe.py --perturbation continue

rng_counter = 1, p.pos=array([ 0.00084648, -0.00009886, 0.00006749])p.f=array([ 15.00509416, 4.49980631, -23.50091465])rng_counter = 2, p.pos=array([ 0.00319346, 0.00025227, -0.00221511])p.f=array([ 24.24592812, 4.85133902, -17.03844439])rng_counter = 3, p.pos=array([ 0.00796504, 0.00108853, -0.00620156])p.f=array([ 14.41245656, 9.68804012, 22.38072167])rng_counter = 4, p.pos=array([ 0.01417787, 0.00289359, -0.00794993])p.f=array([-14.30846688, -4.63255625, 24.10474287])

$ ./pypresso checkpoint_mwe.py --perturbation load
rng_counter = 1, unknown
rng_counter = 2, unknown

rng_counter = 3, p.pos=array([ 0.00795898, 0.00108732, -0.00619730])p.f=array([ 14.41306271, 9.68816141, 22.38029571])rng_counter = 4, p.pos=array([ 0.01416581, 0.00289118, -0.00794146])p.f=array([-14.30786679, -4.63243618, 24.10432117])

$ ./pypresso checkpoint_mwe.py --perturbation resort

rng_counter = 1, p.pos=array([ 0.00084648, -0.00009886, 0.00006749])p.f=array([ 15.00509416, 4.49980631, -23.50091465])rng_counter = 2, p.pos=array([ 0.00319346, 0.00025227, -0.00221511])p.f=array([ 24.24592812, 4.85133902, -17.03844439])rng_counter = 3, p.pos=array([ 0.00795898, 0.00108732, -0.00619730])p.f=array([ 14.41306271, 9.68816141, 22.38029571])rng_counter = 4, p.pos=array([ 0.01416581, 0.00289118, -0.00794146])p.f=array([-14.30786679, -4.63243618, 24.10432117])

$ ./pypresso checkpoint_mwe.py --perturbation recalc

rng_counter = 1, p.pos=array([ 0.00084648, -0.00009886, 0.00006749])p.f=array([ 15.00509416, 4.49980631, -23.50091465])rng_counter = 2, p.pos=array([ 0.00319346, 0.00025227, -0.00221511])p.f=array([ 24.24592812, 4.85133902, -17.03844439])rng_counter = 3, p.pos=array([ 0.00795898, 0.00108732, -0.00619730])p.f=array([ 14.41306271, 9.68816141, 22.38029571])rng_counter = 4, p.pos=array([ 0.01416220, 0.00288876, -0.00794705])p.f=array([-14.30750646, -4.63219397, 24.10488068])

According to these results, the trajectory after a reload is similar tothe trajectory one would obtain by forcing a particle resort, anddiverges by 1e-5 in unit of length at the third time step. This mightnot be the only contributing factor for the drift you observed, butcharacterizing which other components of ESPResSo are affected by thereload is not a trivial task since many features of ESPResSo behavedifferently depending on which other features are currently active.

Unfortunately, the checkpointing feature of ESPResSo is still at anexperimental stage and is not well maintained. We have checkpointingtests to guarantee that the state of particles, thermostats, integratorsand numerical solvers is correctly reloaded. However the cell systemisn't properly reloaded at the moment, and therefore isn't tested. Iwould not recommended enabling checkpointing if you need reproducibletrajectories.


Best,
JN

On 2021-12-07 20:21, Lili Zeng wrote:

Hi,

 I'm a PhD student at McGill University in Montreal, Canada, studying
polymer physics, and I'm a user of ESPResSo (version 4.1). I have a
question about checkpoints. I noticed that when I run a simulation
(say, with 2000 timesteps) in one shot, the final positions of my
particles are different from when I run the first half of an identical
simulation (so 1000 timesteps), then save under checkpoint, restore
the simulation using checkpoint, and run to completion (another 1000
timesteps). This is true even for the simplest systems, eg one single
particle suspended in space with no interactions. In this case, the
difference in the final particle position is very small (on order of
10^(-6) for a particle of size 1 in a system box of size 50), but it
still exists. I wanted to ask whether this is normal, and whether
there is any way to get identical results using checkpoint compared to
when not using checkpoint.

 Thank you!

 Lili

checkpoint_mwe.py
Description: Text Data

[Prev in Thread]

Current Thread

[Next in Thread]

Question about checkpoint, Lili Zeng, 2021/12/07
- Re: Question about checkpoint, Jean-Noël Grad <=

Prev by Date: Question about checkpoint
Next by Date: compiling an Espresso 3.3.1 on HPC
Previous by thread: Question about checkpoint
Next by thread: compiling an Espresso 3.3.1 on HPC
Index(es):
- Date
- Thread