octave-maintainers
[Top][All Lists]

## Re: moving toward a 3.0 release

 From: John W. Eaton Subject: Re: moving toward a 3.0 release Date: Wed, 27 Sep 2006 17:04:25 -0400

```On 27-Sep-2006, David Bateman wrote:

| I had hoped to get it ready for 2.9.9 but the segfault I'm having is
| proving rather persistent. If you want to try to help diagnose the
| problem, take the version of eigs attached (some small bug fixes
| relative to the last version) and try
|
| segtest(10000)
|
| function aerr = segtest(iter)
|   %% This will seg-fault octave consistently, but not matlab.
|   n=20;
|   k=4;
|   A =
|
sparse([3:n,1:n,1:(n-2)],[1:(n-2),1:n,3:n],[ones(1,n-2),4*ones(1,n),-ones(1,n-2)]);
|   opts.disp = 0;
|   aerr = 0;
|   for i=1:iter
|     [v1,d1] = eigs(A, k, 'sr', opts);
|     d1 = diag(d1);
|     merr = 0;
|     for i=1:k
|       newerr = max(abs((A - d1(i)*speye(n))*v1(:,i)));
|       if (newerr > merr)
|         merr = newerr;
|       end
|     end
|     fprintf('Max Err: %g\n', merr);
|     if (merr > aerr)
|       aerr = merr;
|     end
|   end
| end
|
| I can get it to seg-fault about once every 20000 by enlarging some of
| the dneupd and daupd work arrays above the recommended sizes, but can't
| eliminate it. valgrind seems to indicate that its the variables "v",
| "dr" and "di" allocated with OCTAVE_LOCAL_BUFFER that are causing the
| problems, Looking at arpack++ they add one to the recocmmended values
| and that seems to make the dominant error the one due to the variable v.
| BTW, FreeMat also seems to have the same issue, and I can crash it in
| much the same way.
|
| One difference I see with arpack++ relative to octave is that arpack++
| uses the new/delete c++ operators on the double, etc types, rather than
| the std::vector class as the OCTAVE_LOCAL_BUFFER code currently does.
| Though, why that should make a difference, I don't know. I'll try and
| see if it helps..

I ran the example and it also crashed for me, but I don't think that I
can effectivley debug this since I know nothing about arpack, and your
function taht uses it is fairly large, so it is difficult for me to
know whether the calls to the arpack routines are correct (have
correctly dimensioned arrays, etc.).

I see the crash that looks like this:

*** glibc detected *** malloc(): memory corruption: 0x00000000017bd7c0 ***

Program received signal SIGABRT, Aborted.
[Switching to Thread 46994007574704 (LWP 7885)]
0x00002abda4db907b in raise () from /lib/libc.so.6
(gdb) where
#0  0x00002abda4db907b in raise () from /lib/libc.so.6
#1  0x00002abda4dba84e in abort () from /lib/libc.so.6
#2  0x00002abda4def639 in __fsetlocking () from /lib/libc.so.6
#3  0x00002abda4df6892 in free () from /lib/libc.so.6
#4  0x00002abda4df81ad in malloc () from /lib/libc.so.6
#5  0x00002abda4b38e1d in operator new () from /usr/lib/libstdc++.so.6
#6  0x00002abda4b38f29 in operator new[] () from /usr/lib/libstdc++.so.6
#7  0x00002abda5e406cc in ArrayRep (this=0x168b400, n=6)
at /usr/include/octave-2.9.8/octave/Array.h:70
#8  0x00002abda5e40f38 in Array (this=0x7fffffd04500, n=6)
at /usr/include/octave-2.9.8/octave/Array.h:187
#9  0x00002abda5e40fad in MArray (this=0x7fffffd04500, n=6)
at /usr/include/octave-2.9.8/octave/MArray.h:50
#10 0x00002abda5e40fdd in ComplexColumnVector (this=0x7fffffd04500, n=6)
at /usr/include/octave-2.9.8/octave/CColVector.h:41
#11 0x00002abda5e38f15 in Feigs (address@hidden, nargout=2) at eigs.cc:1285

That line of eigs.cc is the constructor for eig_val, after the call to
dneupd.

F77_FUNC (dneupd, DNEUPD)
(rvec, F77_CONST_CHAR_ARG2 ("A", 1),
sel, dr, di, z, n, sigmar, sigmai, workev,
F77_CONST_CHAR_ARG2 (&bmat, 1), n,
F77_CONST_CHAR_ARG2 ((typ.c_str ()), 2),
k, tol, presid, p, v, n, iparam,
ipntr, workd, workl, lwork, info2
F77_CHAR_ARG_LEN(1) F77_CHAR_ARG_LEN(1)
F77_CHAR_ARG_LEN(2));

if (f77_exception_encountered)
{
error ("eigs: unrecoverable exception encountered in
dneupd");
goto eigs_err;
}

ComplexColumnVector eig_val (k+1);

Are all the arrays (not just the work arrays) that are passed to
dneupd the correct size?  Are you sure they are not corrupted in some
way even before the call to dneupd?  It is possible that there is a
buffer overwriting problem that happens even before that call.

If I had to debug this, I think my strategy would be to eliminate
Octave from the equation and find out whether I could duplicate the
crash using a stripped down Fortran-only program.  If the crash could
be duplicated there, then the bug is either in arpack or my
understanding of how the code is supposed to be used.  If the crash
does not happen with the simpler case, then I'm not sure how I would
isolate the error given the current structure of the code.

Since it seems that calls to these functions are relatively complex,
it would be nice to have another layer around the Fortran at the
liboctave level so that if someone wanted to use this functionality in
C++ they could do it more easily.  That is secondary to finding
out the cause of the crash, but it might help to be able to call this
code directly from a C++ program without all of Octave in the way.

jwe

```

reply via email to