qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] Replace posix-aio with custom thread pool


From: Andrea Arcangeli
Subject: Re: [Qemu-devel] [RFC] Replace posix-aio with custom thread pool
Date: Fri, 12 Dec 2008 18:52:13 +0100

On Fri, Dec 12, 2008 at 11:25:55AM -0600, Anthony Liguori wrote:
> Hrm, that's more complex than I was expecting.  I was thinking the bdrv aio 
> infrastructure would always take an iovec.  Any details about the 
> underlying host's ability to handle the iovec would be insulated.

You can't remove the restart memory-capped mechanism from the dma api,
we've to handle dma to non-ram that potentially requires to copy the
whole buffer so we're forced to have a safe linearization at the dma
api layer. So it's not necessary to reinvent the same
restart-partial-transfer logic in the aio layer too. Just set the
define and teach the aio logic to use pread/pwrite if iovcnt == 1 and
you're done ;).

So what I'm suggesting is simpler than what you were expecting, not
more complex. It would be more complex to replicate the restart-bounce
logic in the aio layer too.

Old drivers using bdrv_aio_read/write will keep working, new drivers
using dma api can also use bdrv_aio_readv/writev and the linearization
will happen inside the dma api if aio misses preadv/pwritev support.

> If we artificially cap at say 50MB, then you do something like:
>
> while (buffer == NULL) {
>   buffer = try_to_bounce(offset, iov, iovcnt, &size);
>   if (buffer == NULL && errno == ENOMEM) {
>      pthread_wait_cond(more memory);
>   }
> }

What I meant is that you'll never get ENOMEM. The task will be instant
killed during memcpy... To hope to get any meaningful behavior from
the above you'd need to set overcommit = 1, otherwise you just need
two qemu to alloc 50M at the same time and then memcpy at the same
time to get one of the two killed with -9.

> This lets us expose a preadv/pwritev function that actually works.  The 
> expectation is that bouncing will outperform just doing pread/pwrite of 
> each vector.  Of course, you could get smart and if try_to_bounce fail, 
> fall back to pread/pwrite each vector.  Likewise, you can fast-path the 
> case of a single iovec to avoid bouncing entirely.

Yes, pread/pwrite can't perform if O_DIRECT is enabled. If O_DIRECT is
disabled they could perform to remotely reasonable levels depending on
the host-exception cost vs memcpy cost, but we'd rather bounce to be
sure: testing the dma api with a bounce buffer of 512bytes (so
maximizing the number of syscalls because of the flood of restarts)
slowdown the I/O like a crawl even if buffering is enabled. The
syscall overhead is clearly very significant, basically memcpy is
faster for 512bytes here.

But just let the dma api do the iovec thing. If you want to provide an
abstraction that works also if the dma api decides to send down a
iovcnt > 1 then you could simply implement the fallback, but I think
it's not worth it, it should never happen that you get a iovcnt > 1
when preadv/pwritev aren't available. So you'd be writing code with
the only result that it could hide a performance bug -> not worth it.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]