qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC v3 7/8] blkio: implement BDRV_REQ_REGISTERED_BUF optimization


From: Stefan Hajnoczi
Subject: Re: [RFC v3 7/8] blkio: implement BDRV_REQ_REGISTERED_BUF optimization
Date: Thu, 18 Aug 2022 15:46:36 -0400

On Thu, Jul 14, 2022 at 12:13:53PM +0200, Hanna Reitz wrote:
> On 08.07.22 06:17, Stefan Hajnoczi wrote:
> > Avoid bounce buffers when QEMUIOVector elements are within previously
> > registered bdrv_register_buf() buffers.
> > 
> > The idea is that emulated storage controllers will register guest RAM
> > using bdrv_register_buf() and set the BDRV_REQ_REGISTERED_BUF on I/O
> > requests. Therefore no blkio_map_mem_region() calls are necessary in the
> > performance-critical I/O code path.
> > 
> > This optimization doesn't apply if the I/O buffer is internally
> > allocated by QEMU (e.g. qcow2 metadata). There we still take the slow
> > path because BDRV_REQ_REGISTERED_BUF is not set.
> 
> Which keeps the question relevant of how slow the slow path is, i.e. whether
> it wouldn’t make sense to keep some of the mem regions allocated there in a
> cache instead of allocating/freeing them on every I/O request.

Yes, bounce buffer reuse would be possible, but let's keep it simple for
now.

> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> >   block/blkio.c | 104 ++++++++++++++++++++++++++++++++++++++++++++++++--
> >   1 file changed, 101 insertions(+), 3 deletions(-)
> > 
> > diff --git a/block/blkio.c b/block/blkio.c
> > index 7fbdbd7fae..37d593a20c 100644
> > --- a/block/blkio.c
> > +++ b/block/blkio.c
> 
> [...]
> 
> > @@ -198,6 +203,8 @@ static BlockAIOCB *blkio_aio_preadv(BlockDriverState 
> > *bs, int64_t offset,
> >           BlockCompletionFunc *cb, void *opaque)
> >   {
> >       BDRVBlkioState *s = bs->opaque;
> > +    bool needs_mem_regions =
> > +        s->needs_mem_regions && !(flags & BDRV_REQ_REGISTERED_BUF);
> 
> Is that condition sufficient?  bdrv_register_buf() has no way of returning
> an error, so it’s possible that buffers are silently not registered.  (And
> there are conditions in blkio_register_buf() where the buffer will not be
> registered, e.g. because it isn’t aligned.)
> 
> The caller knows nothing of this and will still pass
> BDRV_REQ_REGISTERED_BUF, and then we’ll assume the region is mapped but it
> won’t be.
> 
> >       struct iovec *iov = qiov->iov;
> >       int iovcnt = qiov->niov;
> >       BlkioAIOCB *acb;
> 
> [...]
> 
> > @@ -324,6 +333,80 @@ static void blkio_io_unplug(BlockDriverState *bs)
> >       }
> >   }
> > +static void blkio_register_buf(BlockDriverState *bs, void *host, size_t 
> > size)
> > +{
> > +    BDRVBlkioState *s = bs->opaque;
> > +    int ret;
> > +    struct blkio_mem_region region = (struct blkio_mem_region){
> > +        .addr = host,
> > +        .len = size,
> > +        .fd = -1,
> > +    };
> > +
> > +    if (((uintptr_t)host | size) % s->mem_region_alignment) {
> > +        error_report_once("%s: skipping unaligned buf %p with size %zu",
> > +                          __func__, host, size);
> > +        return; /* skip unaligned */
> > +    }
> 
> How big is mem-region-alignment generally?  Is it like 4k or is it going to
> be a real issue?

Yes, it's usually the page size of the MMU/IOMMU. vhost-user and VFIO
have the same requirements so I don't think anything special is
necessary.

> (Also, we could probably register a truncated region.  I know, that’ll break
> the BDRV_REQ_REGISTERED_BUF idea because the caller won’t know we’ve
> truncated it, but that’s no different than just not registering the buffer
> at all.)
> 
> > +
> > +    /* Attempt to find the fd for a MemoryRegion */
> > +    if (s->needs_mem_region_fd) {
> > +        int fd = -1;
> > +        ram_addr_t offset;
> > +        MemoryRegion *mr;
> > +
> > +        /*
> > +         * bdrv_register_buf() is called with the BQL held so mr lives at 
> > least
> > +         * until this function returns.
> > +         */
> > +        mr = memory_region_from_host(host, &offset);
> > +        if (mr) {
> > +            fd = memory_region_get_fd(mr);
> > +        }
> 
> I don’t think it’s specified that buffers registered with
> bdrv_register_buf() must be within a single memory region, is it? So can we
> somehow verify that the memory region covers the whole buffer?

You are right, there is no guarantee. However, the range will always be
within a RAMBlock at the moment because the bdrv_register_buf() calls
are driven by a RAMBlock notifier and match the boundaries of the
RAMBlocks.

I will add a check so this starts failing when that assumption is
violated.

> 
> > +        if (fd == -1) {
> > +            error_report_once("%s: skipping fd-less buf %p with size %zu",
> > +                              __func__, host, size);
> > +            return; /* skip if there is no fd */
> > +        }
> > +
> > +        region.fd = fd;
> > +        region.fd_offset = offset;
> > +    }
> > +
> > +    WITH_QEMU_LOCK_GUARD(&s->lock) {
> > +        ret = blkio_map_mem_region(s->blkio, &region);
> > +    }
> > +
> > +    if (ret < 0) {
> > +        error_report_once("Failed to add blkio mem region %p with size 
> > %zu: %s",
> > +                          host, size, blkio_get_error_msg());
> > +    }
> > +}
> > +
> > +static void blkio_unregister_buf(BlockDriverState *bs, void *host, size_t 
> > size)
> > +{
> > +    BDRVBlkioState *s = bs->opaque;
> > +    int ret;
> > +    struct blkio_mem_region region = (struct blkio_mem_region){
> > +        .addr = host,
> > +        .len = size,
> > +        .fd = -1,
> > +    };
> > +
> > +    if (((uintptr_t)host | size) % s->mem_region_alignment) {
> > +        return; /* skip unaligned */
> > +    }
> > +
> > +    WITH_QEMU_LOCK_GUARD(&s->lock) {
> > +        ret = blkio_unmap_mem_region(s->blkio, &region);
> > +    }
> 
> The documentation of libblkio says that “memory regions must be
> unmapped/freed with exactly the same `region` field values that they were
> mapped/allocated with.”  We don’t set .fd here, though.

That's a bug. The memory region will not be unmapped because libblkio's
HashSet won't match. I'll fix the QEMU code to pass the exact same
struct blkio_mem_region fields.

> 
> It’s also unclear whether it’s allowed to unmap a region that wasn’t mapped,
> but I’ll trust libblkio to detect that.

Yes, it's a nop.

> 
> > +
> > +    if (ret < 0) {
> > +        error_report_once("Failed to delete blkio mem region %p with size 
> > %zu: %s",
> > +                          host, size, blkio_get_error_msg());
> > +    }
> > +}
> > +
> >   static void blkio_parse_filename_io_uring(const char *filename, QDict 
> > *options,
> >                                             Error **errp)
> >   {
> 
> [...]
> 
> > @@ -459,7 +553,7 @@ static int blkio_file_open(BlockDriverState *bs, QDict 
> > *options, int flags,
> >           return ret;
> >       }
> > -    bs->supported_write_flags = BDRV_REQ_FUA;
> > +    bs->supported_write_flags = BDRV_REQ_FUA | BDRV_REQ_REGISTERED_BUF;
> 
> Shouldn’t we also report it as a supported read flag then?

Yes, thank you!

Stefan

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]