qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH 00/40] Sneak peek of virtio and dataplane ch


From: Paolo Bonzini
Subject: Re: [Qemu-devel] [RFC PATCH 00/40] Sneak peek of virtio and dataplane changes for 2.6
Date: Wed, 16 Dec 2015 18:42:55 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0


On 16/12/2015 13:54, Christian Borntraeger wrote:
> Just some quick remarks before I leave into vacation:
> 
> Performance seems to be better than the initial version. I have some
> hangs from time to time when shutting down (also with your previous
> version)

Yes, I've seen them too.  To reproduce, just start some "dd" and
repeatedly stop/cont.  It hangs after 3-4 tries typically.

It's like this:

        I/O thread                      main thread
        -----------------------------------------------------
                                        aio_context_acquire
                                        bdrv_drain
        bh->scheduled = 0
        call bottom half
           aio_context_acquire
           <hang>


The bottom half is hung in the I/O thread, and will never be processed
by the main thread because bh->scheduled = 0.

My current idea is that *all* callers of aio_poll have to call it
without the AioContext acquired, including bdrv_drain.  It's not a
problem now that Fam has fixed almost all bdrv_drain users to use
bdrv_drained_begin/end instead.

The nastiness here is two-fold:

1) doing

        aio_context_release
        aio_poll
        aio_context_acquire

in bdrv_drain is not very safe because the mutex is recursive.  (Have I
ever mentioned I don't like recursive mutexes?...)

2) there is still a race in bdrv_drain

                    bdrv_flush_io_queue(bs);
                    if (bdrv_requests_pending(bs)) {
                        busy = true;
                        aio_context_release(aio_context);
                        aio_poll(aio_context, busy);
                        aio_context_acquire(aio_context);

if we release before aio_poll, the last request can be completed
before aio_poll is invoked, and aio_poll will block forever.  In Linux
terms, bdrv_request_pending would be prepare_to_wait() and aio_poll
would be schedule().  But, we don't have prepare_to_wait().  I have a
hackish solution here, but I don't like it really.

Paolo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]