swarm-support
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Seg fault after 2 days of simulation


From: donalson
Subject: Re: Seg fault after 2 days of simulation
Date: Mon, 20 Sep 1999 14:24:07 -0700

Jan,

   Back when I was developing my HADES model (see Ecology, Dec. 1999) I had a 
very long run dump.  This was pre-Swarm and was pure
"C" code.  (Well, the code wasn't very pure, but you get the idea.)  I had a 
problem that only occured once memory started to
recycle.  Without going into horrible detail, I had a structure that served for 
both predator and prey, depending of how a flag was
set.   Once free memory was effectivly used up, an agent could die, and then be 
immediatly resurected as the other type.  Because I
used the same structure for both predator and prey, all variables in the 
structure would line up.  If I didn't zero out all the
pointer references within the structure before I freed it, it could get 
resurected agent with an intact pointer reference to another
active agent, where it wasn't supposed to have one.  (This is called a 
dangeling pointer.)  In my case, the dump occured about 5
hours into the run, but the dangling pointer that started the chain of events 
would somethimes occur an hour or more before.  I don't
have a feeling as to how memory is recycled in Swarm, so I don't know if this 
can happen to you.

You have my most sincere condolences, I spent 3 weeks back tracing from the 
core dump to the original dangeling pointer while
figuring out the logic above.  I have gotten similar dumps when one of my 
programs has corrupted memory.  There must be some type of
lookup table residing in memory that if you accidently over write it, the 
system can no longer reference standard calls.

Good Luck,

D3

Jan Kreft wrote:

> Dear all,
>
> I've been making many long simulations during the last weeks and some
> of them crashed with a seg fault after a long time. Since I have to
> wait two days to reach the same time again in the sim, I'd like to
> know what I should do to track this error down.
>
> I'm really clueless as to what causes this crash, the same messages
> have been called in the program thousands of times before. Why does it
> go wrong after 56884 steps when it worked all the time before?
>
> Here is the last line of sim output with the output from gdb:
>
> diffusion at 56884 made 13 diffSteps totalling 5368215 steps
>
> Program received signal SIGSEGV, Segmentation fault.
> 0x404134d8 in chunk_free (ar_ptr=0x4046d300, p=0x9c7bf90) at malloc.c:2942
> malloc.c:2942: No such file or directory.
> Current language:  auto; currently c
> (gdb) bt
> #0  0x404134d8 in chunk_free (ar_ptr=0x4046d300, p=0x9c7bf90) at malloc.c:2942
> #1  0x40412c3e in malloc_extend_top (ar_ptr=0x4046d300, nb=72) at 
> malloc.c:2464
> #2  0x40413253 in chunk_alloc (ar_ptr=0x4046d300, nb=72) at malloc.c:2793
> #3  0x40412d15 in __libc_malloc (bytes=64) at malloc.c:2561
> #4  0x4019f672 in xmalloc (size=64) at xmalloc.c:13
> #5  0x4018b409 in -[Zone(c) _allocIVars:] (self=0x81dc470, _cmd=0x40197c10, 
> aClass=0x817d800) at Zone.m:28
> #6  0x4017ddaa in +[CreateDrop(s) _createBegin:] (self=0x817d800, 
> _cmd=0x817d840, aZone=0x81dc470) at Create.m:32
> #7  0x80523aa in +[Mcr create] (self=0x817d800, _cmd=0x817d898) at Mcr.m:16
> #8  0x8052d5a in -[Mcr intersectMcr:] (self=0x8282d98, _cmd=0x817d398, 
> m=0x90c71c0) at Mcr.m:169
> #9  0x8050e58 in -[Quad growMcr] (self=0x90c7188, _cmd=0x817d2e8) at 
> Quad.m:325
> #10 0x805066c in -[Quad addNode:] (self=0x8282d60, _cmd=0x818ac70, 
> p=0x90c7188) at Quad.m:142
> #11 0x809d94f in -[Bacillus reproduce] (self=0x9af8cc0, _cmd=0x818ac20) at 
> Bacillus.m:756
> #12 0x809d088 in -[Bacillus checkVolume] (self=0x9af8cc0, _cmd=0x818acb0) at 
> Bacillus.m:662
> #13 0x809dad8 in -[Bacillus step] (self=0x9af8cc0, _cmd=0x817d558) at 
> Bacillus.m:780
> #14 0x80521f5 in -[Quad substep] (self=0x96ccfb0, _cmd=0x817d568) at 
> Quad.m:650
> #15 0x8052257 in -[Quad substep] (self=0x9984580, _cmd=0x817d568) at 
> Quad.m:655
> #16 0x8052257 in -[Quad substep] (self=0x9396d18, _cmd=0x817d568) at 
> Quad.m:655
> #17 0x8052257 in -[Quad substep] (self=0x94c1320, _cmd=0x817d568) at 
> Quad.m:655
> #18 0x8052257 in -[Quad substep] (self=0x963c8d0, _cmd=0x817d568) at 
> Quad.m:655
> #19 0x8052257 in -[Quad substep] (self=0x97797e8, _cmd=0x817d570) at 
> Quad.m:655
> #20 0x80522b3 in -[Quad step] (self=0x8282d60, _cmd=0x817d580) at Quad.m:665
> #21 0x805231f in -[Quad stepOn:] (self=0x8282d60, _cmd=0x818beb8, roundth=1) 
> at Quad.m:672
> #22 0x401817ac in -[Object(s) _perform:with:] (self=0x8282d60, 
> _cmd=0x400b8070, aSel=0x818beb8, anObject1=0x1) at DefObject.m:521
> #23 0x400a86e7 in -[ActionTo(1) __performAction::] (self=0x845fde0, 
> _cmd=0x400bbb28, anActivity=0x9bb82a0) at Action.m:277
> #24 0x400af75b in -[Activity(c) __run:] (self=0x9bb82a0, _cmd=0x400bbb10) at 
> XActivity.m:185
> #25 0x400af65d in -[Activity(c) __run:] (self=0x84625c8, _cmd=0x400bbb10) at 
> XActivity.m:143
> #26 0x400af65d in -[Activity(c) __run:] (self=0x8461bf0, _cmd=0x400bbb10) at 
> XActivity.m:143
> #27 0x400af65d in -[Activity(c) __run:] (self=0x8461000, _cmd=0x400bbaf8) at 
> XActivity.m:143
> #28 0x400af539 in -[Activity(c) _run] (self=0x8461000, _cmd=0x8183910) at 
> XActivity.m:72
> #29 0x80792a5 in -[GeckoControlSwarm go] (self=0x8259b08, _cmd=0x817ced8) at 
> GeckoControlSwarm.m:315
> #30 0x805013f in main (argc=4, argv=0xbffff7c4) at main.m:42
> (gdb)
>
> And here is the relevant code:
>
> #import <limits.h>
> #import <math.h>
> #import "Mcr.h"
> #import "Stat.h"
>
> // Defining the methods for an Mcr.
> @implementation Mcr
>
> //creating a null Mcr.
> +create
> {
>     id result;
>     result = [[Mcr createBegin: globalZone] createEnd];
>     [result setNullMcr];
>     return result;
> }
>
> -(void) drop
> {
>     [super drop];
> }
>
> -setNullMcr
> {
>     lx = INT_MAX;
>     ux = INT_MIN;
>     ly = INT_MAX;
>     uy = INT_MIN;
>     lz = INT_MAX;
>     uz = INT_MIN;
>     return self;
> }
>
> ...........
>
> What should I do to track this down further?
>
> Many thanks, Jan.
>
>                   ==================================
>    Swarm-Support is for discussion of the technical details of the day
>    to day usage of Swarm.  For list administration needs (esp.
>    [un]subscribing), please send a message to <address@hidden>
>    with "help" in the body of the message.

--
*********************************************************************
* Doug Donalson                 Office: (805) 893-2962
* Ecology, Evolution,           Home:   (805) 961-4447
* and Marine Biology            email address@hidden
* UC Santa Barbara
* Santa Barbara Ca. 93106
*********************************************************************
*
*   The most exciting phrase to hear in science, the one that
*   heralds new discoveries, is not "EUREKA" (I have found it) but
*   "That's funny ...?"
*
*       Isaac Asimov
*
*********************************************************************



                  ==================================
   Swarm-Support is for discussion of the technical details of the day
   to day usage of Swarm.  For list administration needs (esp.
   [un]subscribing), please send a message to <address@hidden>
   with "help" in the body of the message.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]