[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [RFC v3 7/8] blkio: implement BDRV_REQ_REGISTERED_BUF optimization
From: |
Stefan Hajnoczi |
Subject: |
Re: [RFC v3 7/8] blkio: implement BDRV_REQ_REGISTERED_BUF optimization |
Date: |
Thu, 18 Aug 2022 15:46:36 -0400 |
On Thu, Jul 14, 2022 at 12:13:53PM +0200, Hanna Reitz wrote:
> On 08.07.22 06:17, Stefan Hajnoczi wrote:
> > Avoid bounce buffers when QEMUIOVector elements are within previously
> > registered bdrv_register_buf() buffers.
> >
> > The idea is that emulated storage controllers will register guest RAM
> > using bdrv_register_buf() and set the BDRV_REQ_REGISTERED_BUF on I/O
> > requests. Therefore no blkio_map_mem_region() calls are necessary in the
> > performance-critical I/O code path.
> >
> > This optimization doesn't apply if the I/O buffer is internally
> > allocated by QEMU (e.g. qcow2 metadata). There we still take the slow
> > path because BDRV_REQ_REGISTERED_BUF is not set.
>
> Which keeps the question relevant of how slow the slow path is, i.e. whether
> it wouldn’t make sense to keep some of the mem regions allocated there in a
> cache instead of allocating/freeing them on every I/O request.
Yes, bounce buffer reuse would be possible, but let's keep it simple for
now.
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> > block/blkio.c | 104 ++++++++++++++++++++++++++++++++++++++++++++++++--
> > 1 file changed, 101 insertions(+), 3 deletions(-)
> >
> > diff --git a/block/blkio.c b/block/blkio.c
> > index 7fbdbd7fae..37d593a20c 100644
> > --- a/block/blkio.c
> > +++ b/block/blkio.c
>
> [...]
>
> > @@ -198,6 +203,8 @@ static BlockAIOCB *blkio_aio_preadv(BlockDriverState
> > *bs, int64_t offset,
> > BlockCompletionFunc *cb, void *opaque)
> > {
> > BDRVBlkioState *s = bs->opaque;
> > + bool needs_mem_regions =
> > + s->needs_mem_regions && !(flags & BDRV_REQ_REGISTERED_BUF);
>
> Is that condition sufficient? bdrv_register_buf() has no way of returning
> an error, so it’s possible that buffers are silently not registered. (And
> there are conditions in blkio_register_buf() where the buffer will not be
> registered, e.g. because it isn’t aligned.)
>
> The caller knows nothing of this and will still pass
> BDRV_REQ_REGISTERED_BUF, and then we’ll assume the region is mapped but it
> won’t be.
>
> > struct iovec *iov = qiov->iov;
> > int iovcnt = qiov->niov;
> > BlkioAIOCB *acb;
>
> [...]
>
> > @@ -324,6 +333,80 @@ static void blkio_io_unplug(BlockDriverState *bs)
> > }
> > }
> > +static void blkio_register_buf(BlockDriverState *bs, void *host, size_t
> > size)
> > +{
> > + BDRVBlkioState *s = bs->opaque;
> > + int ret;
> > + struct blkio_mem_region region = (struct blkio_mem_region){
> > + .addr = host,
> > + .len = size,
> > + .fd = -1,
> > + };
> > +
> > + if (((uintptr_t)host | size) % s->mem_region_alignment) {
> > + error_report_once("%s: skipping unaligned buf %p with size %zu",
> > + __func__, host, size);
> > + return; /* skip unaligned */
> > + }
>
> How big is mem-region-alignment generally? Is it like 4k or is it going to
> be a real issue?
Yes, it's usually the page size of the MMU/IOMMU. vhost-user and VFIO
have the same requirements so I don't think anything special is
necessary.
> (Also, we could probably register a truncated region. I know, that’ll break
> the BDRV_REQ_REGISTERED_BUF idea because the caller won’t know we’ve
> truncated it, but that’s no different than just not registering the buffer
> at all.)
>
> > +
> > + /* Attempt to find the fd for a MemoryRegion */
> > + if (s->needs_mem_region_fd) {
> > + int fd = -1;
> > + ram_addr_t offset;
> > + MemoryRegion *mr;
> > +
> > + /*
> > + * bdrv_register_buf() is called with the BQL held so mr lives at
> > least
> > + * until this function returns.
> > + */
> > + mr = memory_region_from_host(host, &offset);
> > + if (mr) {
> > + fd = memory_region_get_fd(mr);
> > + }
>
> I don’t think it’s specified that buffers registered with
> bdrv_register_buf() must be within a single memory region, is it? So can we
> somehow verify that the memory region covers the whole buffer?
You are right, there is no guarantee. However, the range will always be
within a RAMBlock at the moment because the bdrv_register_buf() calls
are driven by a RAMBlock notifier and match the boundaries of the
RAMBlocks.
I will add a check so this starts failing when that assumption is
violated.
>
> > + if (fd == -1) {
> > + error_report_once("%s: skipping fd-less buf %p with size %zu",
> > + __func__, host, size);
> > + return; /* skip if there is no fd */
> > + }
> > +
> > + region.fd = fd;
> > + region.fd_offset = offset;
> > + }
> > +
> > + WITH_QEMU_LOCK_GUARD(&s->lock) {
> > + ret = blkio_map_mem_region(s->blkio, ®ion);
> > + }
> > +
> > + if (ret < 0) {
> > + error_report_once("Failed to add blkio mem region %p with size
> > %zu: %s",
> > + host, size, blkio_get_error_msg());
> > + }
> > +}
> > +
> > +static void blkio_unregister_buf(BlockDriverState *bs, void *host, size_t
> > size)
> > +{
> > + BDRVBlkioState *s = bs->opaque;
> > + int ret;
> > + struct blkio_mem_region region = (struct blkio_mem_region){
> > + .addr = host,
> > + .len = size,
> > + .fd = -1,
> > + };
> > +
> > + if (((uintptr_t)host | size) % s->mem_region_alignment) {
> > + return; /* skip unaligned */
> > + }
> > +
> > + WITH_QEMU_LOCK_GUARD(&s->lock) {
> > + ret = blkio_unmap_mem_region(s->blkio, ®ion);
> > + }
>
> The documentation of libblkio says that “memory regions must be
> unmapped/freed with exactly the same `region` field values that they were
> mapped/allocated with.” We don’t set .fd here, though.
That's a bug. The memory region will not be unmapped because libblkio's
HashSet won't match. I'll fix the QEMU code to pass the exact same
struct blkio_mem_region fields.
>
> It’s also unclear whether it’s allowed to unmap a region that wasn’t mapped,
> but I’ll trust libblkio to detect that.
Yes, it's a nop.
>
> > +
> > + if (ret < 0) {
> > + error_report_once("Failed to delete blkio mem region %p with size
> > %zu: %s",
> > + host, size, blkio_get_error_msg());
> > + }
> > +}
> > +
> > static void blkio_parse_filename_io_uring(const char *filename, QDict
> > *options,
> > Error **errp)
> > {
>
> [...]
>
> > @@ -459,7 +553,7 @@ static int blkio_file_open(BlockDriverState *bs, QDict
> > *options, int flags,
> > return ret;
> > }
> > - bs->supported_write_flags = BDRV_REQ_FUA;
> > + bs->supported_write_flags = BDRV_REQ_FUA | BDRV_REQ_REGISTERED_BUF;
>
> Shouldn’t we also report it as a supported read flag then?
Yes, thank you!
Stefan
signature.asc
Description: PGP signature