qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2 00/13] dataplane: use block layer


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] [PATCH v2 00/13] dataplane: use block layer
Date: Wed, 17 Jul 2013 17:22:16 +0800
User-agent: Mutt/1.5.21 (2010-09-15)

On Mon, Jul 15, 2013 at 07:05:50PM +0200, Paolo Bonzini wrote:
> Il 15/07/2013 16:42, Stefan Hajnoczi ha scritto:
> > v2:
> >  * Rebased onto qemu.git/master
> >  * Added comment explaining how the dataplane thread is restarted after 
> > draining [pbonzini]
> > 
> > This series adds image format, QMP 'transation', and QMP 'block_resize' 
> > support
> > to dataplane.  This is done by replacing custom Linux AIO code with QEMU 
> > block
> > layer APIs.
> > 
> > In order for block layer APIs to be safe, bdrv_drain_all() is modified to be
> > aware of device emulation threads (like dataplane).
> > BlockDevOps->drain_threads_cb() is introduced as a way to stop device 
> > emulation
> > threads temporarily.  Device emulation threads may resume after this main 
> > loop
> > iteration completes, which is enough to allow code running under the QEMU
> > global mutex a chance to use the block device temporarily.
> > 
> > bdrv_get_aio_context() is dropped in favor of a thread-local AioContext
> > pointer.  This allows qemu_bh_new(), qemu_aio_wait(), and friends to
> > automatically use the right AioContext.  When we introduce a 
> > BlockDriverState
> > lock in the future, it will be possible to use block devices from multiple
> > threads arbitrarily (right now we only allow it if bdrv_drain_all() is 
> > used).
> > 
> > Three open issues:
> > 
> >  * Timers only work in the main loop, so I/O throttling is ignored and QED 
> > is
> >    unsafe with x-data-plane=on.  I am tackling this in a separate series 
> > that
> >    makes QEMUTimer thread-safe and can then re-enable I/O throttling.
> > 
> >  * Peformance analysis in progress.
> > 
> >  * Need to test with NBD, Gluster, and other non-file protocols.
> 
> Since this is for 1.6, I am still not sure this is the right approach
> (especially a block device lock scares me a bit...).

I agree that the block device lock looks difficult.  When I started in
that direction it quickly became clear that it requires a lot of effort
to solve.  It is hard because of BDS relationships and the fact that I/O
requests live across main loop iterations (they must be reentered,
either as a callback or a coroutine).

But that approach is not implemented in this patch series...

> Quoting a private mail from Kevin (I guess he doesn't mind), the
> possible approaches include:
> 
> * Stopping the dataplane event loop while another user accesses the
> BlockDriverState
> 
> * Make other users use Bottom Halves to do a context switch into the
> dataplane event loop and do their operation from there
> 
> * Add locks the BlockDriverState and just do the accesses from different
> threads
> 
> and I believe a mix of these would be the way to go.
>
> Stopping the dataplane event loop is not really necessary for
> correctness; that only requires stopping the ioeventfd, and indeed this
> series already includes patches for this.  So you definitely have my
> "approval" (in quotes because you're the maintainer...) for patches 1-2-3.

The TLS AioContext approach is compatible with the future improvements
that you mentioned.  Remember that BlockDevOps->drain_threads_cb() is
orthogonal to TLS AioContext.  We can still get at the AioContext for a
BDS, if necessary.

The point of TLS AioContext is to avoid explicitly passing AioContext
everywhere.  Since BH is sometimes used deep down it would be very
tedious to start passing around AioContext.  qemu_bh_*() should just do
the right thing.

BTW, besides stopping ioeventfd it is also necessary to either complete
pending I/O requests (drain) or to at least postpone callbacks for
pending I/O requests while another thread is accessing the BDS.

(The problem with postponing callbacks is that block drivers sometimes
have dependencies between requests, for example allocating write
requests that touch the some metadata.  Simply postponing callbacks
leads to deadlock when waiting for dependencies.  This is where locks
save the day at the cost of complexity because they allow multiple
threads to make progress safely.)

> However, I'm not sure it is feasible to have a single AioContext per
> thread.  Consider a backup job whose target is not running in the same
> AioContext as the source; for optimization it might use bdrv_aio_*
> instead of coroutine-based I/O.
> 
> So, once you have stopped the ioeventfd to correctly support
> bdrv_drain_all(), I would like to see code that makes other threads use
> bottom halves to do a context switch.  Ping Fan's patches for a
> thread-safe BH list are useful for this.

If I understand correctly this is means a block job wraps I/O calls so
that they are scheduled in a remote AioContext/thread when necessary.
This is necessary when a block job touches multiple BDS which belong to
different threads.

> I think there are two ways to implement synchronous operations, there
> are two approaches:
> 
> * A lock, just like Kevin mentioned (though I would place it on the
> AioContext, not the BlockDriverState).  Then you can call aio_wait under
> the lock in bdrv_read and friends.

This sounds clever.  So block jobs don't need to explicitly wrap I/O
calls, it happens automatically in block.c:bdrv_read() and friends.

> * A "complementary" bottom half that is scheduled in the
> qemu_aio_context.  This second bottom half terminates processing in the
> dataplane thread and restarts the thread starting the synchronous
> operation.  I'm not sure how easy it would be to write infrastructure
> for this, but basically it'd mean adding a third path ("in coroutine
> context, but in the wrong AioContext") to the current "if
> (qemu_in_coroutine())" tests that block.c has.

Not sure I understand this approach but maybe something like moving a
coroutine from one AioContext to another and back again.

FWIW I'm also looking at how much (little) effort it would take to
disable dataplane temporarily while block jobs are running.  Since we're
discussing switching between threads already, it's not clear that we
gain much performance by trying to use dataplane.  It clearly adds
complexity.

Stefan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]