Re: [Qemu-devel] [RFC 1/2] pci-dma-api-v1

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC 1/2] pci-dma-api-v1

From:	Andrea Arcangeli
Subject:	Re: [Qemu-devel] [RFC 1/2] pci-dma-api-v1
Date:	Sun, 30 Nov 2008 18:20:38 +0100

On Fri, Nov 28, 2008 at 07:18:19PM +0000, Jamie Lokier wrote:
> Blue Swirl wrote:
> > >  I wonder how can possibly aio_readv/writev be missing in posix aio?
> > >  Unbelievable. It'd be totally trivial to add those to glibc, much
> > >  easier infact than to pthread_create by hand, but how can we add a
> > >  dependency on a certain glibc version? Ironically it'll be more
> > >  user-friendly to add dependency on linux kernel-aio implementation
> > >  that is already available for ages and it's guaranteed to run faster
> > >  (or at least not slower).
> > 
> > There's also lio_listio that provides for vectored AIO.
> 
> I think lio_listio is the missing aio_readv/writev.
> 
> It's more versatile, and that'll by why POSIX never bothered with
> aio_readv/writev.
> 
> Doesn't explain why they didn't _start_ with aio_readv before
> inventing lio_listio, but there you go.  Unix history.

Well, I before grepped for readv or writev syscalls inside the
glibc-2.6.1/sysdeps/pthread and there was nothing there, so lio_listio
doesn't seem to be helpful at all. If that was a _kernel_ API then the
kernel could see the whole queue immediately and coalesce all
oustanding contiguous I/O in a single DMA operation, but the userland
queue here will not be visible to the kernel. So unless we can execute
the readv and writev syscalls, O_DIRECT performance with direct DMA
API will be destroyed compared to bounce buffering because the guest
OS will submit large DMA operations that will be executed as 4k DMA
operation in the storage hardware, the memcpy overhead that we're
eliminating is minor compared to such a major I/O bottleneck with
qemu cache=off.

The only way we could possibly use lio_listio, would be to improve
glibc so the lio_listio op will be smart enough to call readv/writev
if it finds contiguous I/O being queued, but overall this would be
still largely inefficient. If you check the dma api, I'm preparing
struct iovec *iov ready to submit to the kernel either through the
inexistent aio_readv/writev or with the kernel-API
IOCB_CMD_PREADV/WRITEV (they obviously both take the well defined
struct iovec as param so there's zero overhead).

So even if we improve lio_listio, lio_listio would introduce
artificial splitting-recolaescing overhead just because of its weird
API. Entirely different would be if lio_listio would resemble the
kernel sys_iosubmit API and had a PREADV/WRITEV type to submit iovecs,
but this only has a LIO_READ/WRITE, no sign of LIO_READV/WRITEV
unfortunately :(. Amittedly it's not so common having to use
readv/writev on contiguous I/O but the emulated DMA with SG truly
requires this. Anything that can't handle a native iovec we can't use.

Likely we'll have to add a pthread_create based our own aio
implementation for non-linux and kernel-AIO for linux, and get rid of
librt as a whole. It's pointless to mix our own userland aio (that will
support readv/writev too), with the posix one. And if this was just a
linux project kernel AIO would suffice. All DB that I know need to use
readv/writev with AIO and O_DIRECT for similar reasons as us, already
used kernel AIO.

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] [RFC 1/2] pci-dma-api-v1, Andrea Arcangeli, 2008/11/27
- [Qemu-devel] [RFC 2/2] bdrv_aio_readv/writev_em, Andrea Arcangeli, 2008/11/27
  - Re: [Qemu-devel] [RFC 2/2] bdrv_aio_readv/writev_em, Jamie Lokier, 2008/11/28
- Re: [Qemu-devel] [RFC 1/2] pci-dma-api-v1, Blue Swirl, 2008/11/27
  - Re: [Qemu-devel] [RFC 1/2] pci-dma-api-v1, Andrea Arcangeli, 2008/11/27
    - Re: [Qemu-devel] [RFC 1/2] pci-dma-api-v1, Blue Swirl, 2008/11/28
    - Re: [Qemu-devel] [RFC 1/2] pci-dma-api-v1, Andrea Arcangeli, 2008/11/28
    - Re: [Qemu-devel] [RFC 1/2] pci-dma-api-v1, Blue Swirl, 2008/11/28
    - Re: [Qemu-devel] [RFC 1/2] pci-dma-api-v1, Jamie Lokier, 2008/11/28
    - Re: [Qemu-devel] [RFC 1/2] pci-dma-api-v1, Avi Kivity, 2008/11/29
    - Re: [Qemu-devel] [RFC 1/2] pci-dma-api-v1, Andrea Arcangeli <=
    - Re: [Qemu-devel] [RFC 1/2] pci-dma-api-v1, Anthony Liguori, 2008/11/30
    - Re: [Qemu-devel] [RFC 1/2] pci-dma-api-v1, Andrea Arcangeli, 2008/11/30
    - [Qemu-devel] [RFC 1/1] pci-dma-api-v2, Andrea Arcangeli, 2008/11/30
    - [Qemu-devel] Re: [RFC 1/1] pci-dma-api-v2, Blue Swirl, 2008/11/30
    - [Qemu-devel] Re: [RFC 1/1] pci-dma-api-v2, Andrea Arcangeli, 2008/11/30
    - [Qemu-devel] Re: [RFC 1/1] pci-dma-api-v2, Blue Swirl, 2008/11/30
    - [Qemu-devel] Re: [RFC 1/1] pci-dma-api-v2, Andrea Arcangeli, 2008/11/30
    - [Qemu-devel] Re: [RFC 1/1] pci-dma-api-v2, Blue Swirl, 2008/11/30
    - Re: [Qemu-devel] Re: [RFC 1/1] pci-dma-api-v2, Anthony Liguori, 2008/11/30
    - Re: [Qemu-devel] [RFC 1/1] pci-dma-api-v2, Anthony Liguori, 2008/11/30

Prev by Date: Re: [Qemu-devel] [PATCH 6/6] qemu fixes
Next by Date: Re: [Qemu-devel] Re: [PATCH 2/2] Add __noreturn function attribute
Previous by thread: Re: [Qemu-devel] [RFC 1/2] pci-dma-api-v1
Next by thread: Re: [Qemu-devel] [RFC 1/2] pci-dma-api-v1
Index(es):
- Date
- Thread