qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virt


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support
Date: Thu, 14 Aug 2014 10:39:21 +0100
User-agent: Mutt/1.5.23 (2014-03-12)

On Wed, Aug 13, 2014 at 09:49:23PM +0800, Ming Lei wrote:
> On Wed, Aug 13, 2014 at 9:16 PM, Paolo Bonzini <address@hidden> wrote:
> > Il 13/08/2014 11:54, Kevin Wolf ha scritto:
> >> Am 12.08.2014 um 21:08 hat Paolo Bonzini geschrieben:
> >>> Il 12/08/2014 10:12, Ming Lei ha scritto:
> >>>>>> The below patch is basically the minimal change to bypass coroutines.  
> >>>>>> Of course
> >>>>>> the block.c part is not acceptable as is (the change to 
> >>>>>> refresh_total_sectors
> >>>>>> is broken, the others are just ugly), but it is a start.  Please run 
> >>>>>> it with
> >>>>>> your fio workloads, or write an aio-based version of a 
> >>>>>> qemu-img/qemu-io *I/O*
> >>>>>> benchmark.
> >>>> Could you explain why the new change is introduced?
> >>>
> >>> It provides a fast path for bdrv_aio_readv/writev whenever there is
> >>> nothing to do after the driver routine returns.  In this case there is
> >>> no need to wrap the AIOCB returned by the driver routine.
> >>>
> >>> It doesn't go all the way, and in particular it doesn't reverse
> >>> completely the roles of bdrv_co_readv/writev vs. bdrv_aio_readv/writev.
> >>
> >> That's actually why I think it's an option. Remember that, like you say
> >> below, we're optimising for an extreme case here, and I certainly don't
> >> want to hurt the common case for it. I can't imagine a way of reversing
> >> the roles without multiplying the cost for the coroutine path.
> >
> > I'm not that worried about it.  Perhaps it's enough to add an
> > !qemu_in_coroutine() to the AIO fast path, and let the driver provide
> > optimized coroutine paths like in your patches that allocate AIOCBs on
> > the stack.
> 
> IMO, it will not be a extreme case as SSD or high performance storage
> becomes more popular, coroutine starts to affect performance if IOPS
> is more than 100K, as previous computation.

The case you seem to care about is raw images on high IOPS devices.  You
mentioned 1M IOPS devices in another email.

You don't seem to want QEMU's block layer features, that is why you are
trying to bypass them instead of optimizing the block layer.

That begs the question whether you should look at PCI passthrough
instead?

Stefan

Attachment: pgpRHMNAltjE_.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]