Re: segfaults building documentation when machine under load

octave-maintainers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: segfaults building documentation when machine under load

From:	Daniel J Sebald
Subject:	Re: segfaults building documentation when machine under load
Date:	Sat, 23 May 2020 00:23:59 -0400
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0

On 5/22/20 4:52 PM, John W. Eaton wrote:

On 5/19/20 4:11 PM, Dmitri A. Sergatskov wrote:
On Tue, May 19, 2020 at 4:02 PM John W. Eaton <address@hidden<mailto:address@hidden>> wrote:
    On 5/19/20 3:26 PM, Dmitri A. Sergatskov wrote:

     >     Should we switch to bug-tracker?
     >     I was able to get a crash when I bumped the jobs to 200.
     >     bt is attached. The relevant part seems to be:

    If I use a large number of jobs, I see

        error: imwrite: invalid empty image
        error: called from
            __imwrite__ at line 40 column 5
            imwrite at line 125 column 5
            print at line 755 column 13
            interpimages at line 72 column 5

    but no segfaults.

    It does look like a threading issue.


I used a simplified test by Andreas:
parallel -N0 -q octave --norc --silent --no-history --eval 'figure(1,"visible", "off");' ::: {1..200}
Thanks.
After much confusion, I think I arrived at a solution. I pushed thefollowing changeset to stable and merged with default:
   http://hg.savannah.gnu.org/hgweb/octave/rev/00a9a49c7670

on stable and merged with default.
These most recent changes appear to improve the situation for the testcase shown above. I'm not longer able to cause a segfault with thefollowing parallel execution:
parallel -j 50 -N0 -q octave --norc --silent --no-history --eval'figure (1, "visible", "off");' ::: {1..1000}
Here's the summary from the changset commit message:

----
This change is a further attempt to avoid segfaults when shutting downthe interpreter and exiting the GUI event loop. The latest approach isto have the interpreter signal that it is finished with "normal" commandexecution (REPL, command line script, or --eval option code), then letthe GUI thread process any remaining functions in its event loop(s) thensignal back to the interpreter that it is OK to shutdown. Once theshutdown has happened (which may involve further calls to the GUI threadwhile executing atexit functions or finish.m or other shutdown code, theinterpreter signals back to the GUI that shutdown is complete. At thatpoint, the GUI can delete the interpreter object and exit.
----
Before this change, the GUI could still be processing events (displayingthe figure window, for example) while the interpreter was being deleted. Obviously, that causes trouble.
Although we recognized this problem before, none of the previoussolutions have really worked. See the commit message forhttps://hg.savannah.gnu.org/hgweb/octave/rev/cdb681adc85a, for example,where I noted that
... the crash described in bug report #56952 appeared to be happeningwhen the Qt event loop was callingQtHandles::qt_graphics_toolkit::create_object when the interpreter wasbeing deleted and the gh_manager object was already invalid, ...
I noticed this again and finally realized that we could probably use theQt event queue to ensure that pending graphics events are allowed tofinish before shutting down the interpreter. It seems to work for allthe tests I've tried so far, including creating a figure in the finish.mscript or using "atexit ('sombrero')".

Some time ago a group of us looked at the problem of exiting the GUIwhen the worker core is busy:


https://savannah.gnu.org/bugs/?44485

I had put some effort into a nice system whereby a QTimer waits for thecore to finish and after a certain amount of time it would signal that adialog box appear asking if the user wants to force an exit. Of course,if the core does then quit while the user hasn't answered the dialog yetthen the dialog box should disappear. It all had to do with savingfiles in the editor and closing the editor and so on.

However, I never completed the patch because I could never get thesequencing just right. There was always something like "What if theuser does this?", or "What if the core finishes at this point?". Thisshutdown signal might be just the thing to make it work. I'll revisitthat bug when I can.

Dan

[Prev in Thread]

Current Thread

[Next in Thread]

Re: segfaults building documentation when machine under load, (continued)

Prev by Date: Re: segfaults building documentation when machine under load
Next by Date: Re: segfaults building documentation when machine under load
Previous by thread: Re: segfaults building documentation when machine under load
Next by thread: Re: segfaults building documentation when machine under load
Index(es):
- Date
- Thread