[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ESPResSo] Self-avoiding random walk SAW

From: Mikheil Azatov
Subject: Re: [ESPResSo] Self-avoiding random walk SAW
Date: Fri, 13 Aug 2010 16:07:04 -0400


I think I actually figured out why I had a segmentation fault when I was running it in parallel. It seems like "part->bl.e" does not exist if the particles is a ghost particle so I was getting a segmentation fault when I was trying to access part->bl.e[0], ... So I just added a check on particle being a ghost before doing that part and it seems to work fine now. The program runs without errors now but there's still a problem.  It stops in the middle of the simulation as if there is an infinite loop somewhere. So basically at some integration step it would just not go forward although both CPUs seem to work on 100%. I wonder if anyone had this type of problem or it's just something in my code.

Thanks, Mike

On Thu, Aug 12, 2010 at 4:32 PM, Mikheil Azatov <address@hidden> wrote:
Hi thanks again ,

One error down! It was actually the order of variables in the array that mattered. Now I don't have that problem anymore! Instead I get a segmentation fault which is less fun to track. But here is what I got after tracking where it happens: I have a part in the code where I check if the bond between two particles already exists. And it looks like the error is happening over there. Below is that part of the code with a lot of added fprintf's. As a result I get the following:

j=0  inside while ,  p2 id = 901
p2->bl.n = 3
[bender:17893] *** Process received signal ***
[bender:17893] Signal: Segmentation fault (11)

So basically it says that p2->bl.n=3 but after I try to access p2->bl.e[0] it gives me a segmentation error ...

part of that code:

            int exist=0;
   while(j<p2->bl.n) {
            fprintf(stderr,"j=%d  inside while ,  p2 id = %d\n",j, p2->p.identity); 
   fprintf(stderr,"p2->bl.n = %d\n",p2->bl.n ); 
   fprintf(stderr,"p2->bl.e[0]=%d   ",p2->bl.e[0]);
     fprintf(stderr,"p2->bl.e[1]=%d   ",p2->bl.e[1]);
     int type_num = p2->bl.e[j++];
   fprintf(stderr," type_num=%d  ",type_num);
   int type = bonded_ia_params[type_num].type; // array with the bond partners and their count
            fprintf(stderr,"  type=%d  ",type);
   int n_partners = bonded_ia_params[type_num].num;
            fprintf(stderr,"n_partners=%d  ",n_partners);
   j += n_partners;
   if (type_num==bond_id_angle){exist=1; } // checking if the bond already exists
   fprintf(stderr,"exist= %d\n",exist);    

Best wishes,

On Thu, Aug 12, 2010 at 3:39 AM, Axel Arnold <address@hidden> wrote:
On Wednesday 11 August 2010 23:18:01 you wrote:
> Hi,
> I haven't used the debugger yet. I tried printing out stuff in a lot of
> different places in the code to see where the error occurs and why. And I
> think I found something. Two out of my global variables that are defined in
> Espresso file using setmd are equal to 0. It's even more weird. Before the
> error occurs sometimes they are 0 and sometimes they are not. And when this
> switch from being equal to 0 to not being equal to 0 occurs the output
> looks very weird(i.e. it's very messed up, some words or some letters of
> what I'm printing would be missing...). . I define all of them exactly the
> same way.

Do you use mpi_bcast_parameter(FIELD_***) for all the variable callbacks, with
the correct variable name? Are the codes unique and correspond to the order in
the array in global.c? If not, the values are only set on the master node. The
other nodes will get random values for the variables.


JP Dr. Axel Arnold Tel: +49 711 685 67609
ICP, Universität Stuttgart      Email: address@hidden
Pfaffenwaldring 27
70569 Stuttgart, Germany

reply via email to

[Prev in Thread] Current Thread [Next in Thread]