octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: segfaults building documentation when machine under load


From: Daniel J Sebald
Subject: Re: segfaults building documentation when machine under load
Date: Sat, 23 May 2020 00:23:59 -0400
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0

On 5/22/20 4:52 PM, John W. Eaton wrote:
On 5/19/20 4:11 PM, Dmitri A. Sergatskov wrote:


On Tue, May 19, 2020 at 4:02 PM John W. Eaton <address@hidden <mailto:address@hidden>> wrote:

    On 5/19/20 3:26 PM, Dmitri A. Sergatskov wrote:

     >     Should we switch to bug-tracker?
     >     I was able to get a crash when I bumped the jobs to 200.
     >     bt is attached. The relevant part seems to be:

    If I use a large number of jobs, I see

        error: imwrite: invalid empty image
        error: called from
            __imwrite__ at line 40 column 5
            imwrite at line 125 column 5
            print at line 755 column 13
            interpimages at line 72 column 5

    but no segfaults.

    It does look like a threading issue.


I used a simplified test by Andreas:

parallel -N0 -q octave --norc --silent --no-history --eval 'figure (1,"visible", "off");' ::: {1..200}

Thanks.

After much confusion, I think I arrived at a solution.  I pushed the following changeset to stable and merged with default:

   http://hg.savannah.gnu.org/hgweb/octave/rev/00a9a49c7670

on stable and merged with default.

These most recent changes appear to improve the situation for the test case shown above.  I'm not longer able to cause a segfault with the following parallel execution:

    parallel -j 50 -N0 -q octave --norc --silent --no-history --eval 'figure (1, "visible", "off");' ::: {1..1000}

Here's the summary from the changset commit message:

----
This change is a further attempt to avoid segfaults when shutting down the interpreter and exiting the GUI event loop.  The latest approach is to have the interpreter signal that it is finished with "normal" command execution (REPL, command line script, or --eval option code), then let the GUI thread process any remaining functions in its event loop(s) then signal back to the interpreter that it is OK to shutdown.  Once the shutdown has happened (which may involve further calls to the GUI thread while executing atexit functions or finish.m or other shutdown code, the interpreter signals back to the GUI that shutdown is complete.  At that point, the GUI can delete the interpreter object and exit.
----

Before this change, the GUI could still be processing events (displaying the figure window, for example) while the interpreter was being deleted.  Obviously, that causes trouble.

Although we recognized this problem before, none of the previous solutions have really worked.  See the commit message for https://hg.savannah.gnu.org/hgweb/octave/rev/cdb681adc85a, for example, where I noted that

  ... the crash described in bug report #56952 appeared to be happening when the Qt event loop was calling QtHandles::qt_graphics_toolkit::create_object when the interpreter was being deleted and the gh_manager object was already invalid, ...

I noticed this again and finally realized that we could probably use the Qt event queue to ensure that pending graphics events are allowed to finish before shutting down the interpreter.  It seems to work for all the tests I've tried so far, including creating a figure in the finish.m script or using "atexit ('sombrero')".

Some time ago a group of us looked at the problem of exiting the GUI when the worker core is busy:

https://savannah.gnu.org/bugs/?44485

I had put some effort into a nice system whereby a QTimer waits for the core to finish and after a certain amount of time it would signal that a dialog box appear asking if the user wants to force an exit. Of course, if the core does then quit while the user hasn't answered the dialog yet then the dialog box should disappear. It all had to do with saving files in the editor and closing the editor and so on.

However, I never completed the patch because I could never get the sequencing just right. There was always something like "What if the user does this?", or "What if the core finishes at this point?". This shutdown signal might be just the thing to make it work. I'll revisit that bug when I can.

Dan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]