[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Debugging hints wanted
Re: Debugging hints wanted
Tue, 01 Jul 2008 12:14:11 +0200
Hi Ludovic, thanks for your reply
On Mon, 2008-06-30 at 21:42 +0200, Ludovic Courtès wrote:
> Roland Orre <address@hidden> writes:
> > I need hints on how to find occasional segmentation faults
> > and missed GC references. This relates to 64 bit machines.
> Is it x86-64, IA64, or something else?
What I'm trying to get working now is on x86-64 (Opteron) to be
able to run it on a big large memory computer IA64 (Itanium2).
> The Git repository (the future 1.8.6) contains an important bug fix for
> IA64. I think there were x86-64-related during the 1.8.x series, too.
> Thus, I'd suggest using the latest Guile on these platforms.
That's a good hint. I'll check out the code and see if I can locate
the changes. Problem is that I've considered switching a few years,
but since the array API changed from 1.8 it would imply a major rework,
possibly causing other issues as the old array API is used in
hundreds of places in my code, and there may be other API changes
> > My modules have worked perfectly fine on 32 bit machines but
> > on 64 bits I occasionally get something like
> > #<freed cell 0x2...; GC missed a reference> if I run that
> > code fast, which indicates a threading problem (I do not use
> > threads in this case, but seems like guile does). This does
> > not occur if I run guile through gdb. This happens not too often
> > but it seems to be related to string->symbol symbol->string.
> Is it reproducible?
This is not really reproducable. If I execute the lines quick by
loading it as a file then it occurs with about 60 % probability.
If I execute the lines in that file, line by one, it does not
occur. To come around that I can see that it may be complaining
at e.g. a string->symbol conversion. If I then simply replace
that with the id i.e. (lambda(x) x) then it doesn't happen
but probably this relates to the big issue below.
> > My bigger problem though is frequently occurring
> > segmentation faults or otherwise corrupt pointers.
> > If I then run the code in gdb I can get
> > Program received signal SIGSEGV, Segmentation fault.
> > [Switching to Thread 0x2ae316e4f070 (LWP 6699)]
> > 0x00002ae314b9d091 in scm_gc_mark_dependencies (p=0x97c) at
> > gc-mark.c:441
> > 441 if (SCM_GC_MARK_P (ptr))
> > Current language: auto; currently c
> Likewise, is it reproducible? Can you show the full backtrace (it
> should show where 0x97c comes from)?
This is fully reproducible when it happens as shown. Most often
I get a segmentation fault like this. I have attached a full
gdb backtrace from this. This can be produced over and over
with only base address differences.
Sometimes I've got a pointer to some internal structure like
pointing to the procedure of a loop in the middle of a list of
numbers for instance, which is kind illogical as that internal
structure should not be freed.
> Hope this helps,
Description: Text document
- Re: Debugging hints wanted,
Roland Orre <=