qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 1/2] virtio: use blk_io_plug_call() in virtio_irqfd_notify()


From: Ilya Maximets
Subject: Re: [PATCH 1/2] virtio: use blk_io_plug_call() in virtio_irqfd_notify()
Date: Wed, 16 Aug 2023 15:36:32 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.10.0

On 8/15/23 14:08, Stefan Hajnoczi wrote:
> virtio-blk and virtio-scsi invoke virtio_irqfd_notify() to send Used
> Buffer Notifications from an IOThread. This involves an eventfd
> write(2) syscall. Calling this repeatedly when completing multiple I/O
> requests in a row is wasteful.

Hi, Stefan.  This is an interesting change!

There is more or less exactly the same problem with fast network backends
and I was playing around with similar ideas in this area while working on
af-xdp network backend recently.  Primarily, implementation of the Rx BH
for virtio-net device and locking the RCU before passing packets from the
backend to the device one by one.

> 
> Use the blk_io_plug_call() API to batch together virtio_irqfd_notify()
> calls made during Linux AIO (aio=native) or io_uring (aio=io_uring)
> completion processing. Do not modify the thread pool (aio=threads) to
> avoid introducing a dependency from util/ onto the block layer.

But you're introducing a dependency from generic virtio code onto the
block layer in this patch.  This seem to break the module abstraction.

It looks like there are 2 options to resolve the semantics issue here:

1. Move virtio_notify_irqfd() from virtio.c down to the block layer.
   Block layer is the only user, so that may be justified, but it
   doesn't seem like a particularly good solution.  (I'm actually not
   sure why block devices are the only ones using this function...)

2. Move and rename the block/plug library somewhere generic.  The plug
   library seems to not have any dependencies on a block layer, other
   than a name, so it should not be hard to generalize it (well, the
   naming might be hard).

In general, while looking at the plug library, it seems to do what is
typically implemented via RCU frameworks - the delayed function call.
The only difference is that RCU doesn't check for duplicates and the
callbacks are global.  Should not be hard to add some new functionality
to RCU framework in order to address these, e.g. rcu_call_local() for
calls that should be executed once the current thread exits its own
critical section.

Using RCU for non-RCU-protected things might be considered as an abuse.
However, we might solve two issues in one shot if instead of entering
blk_io_plug/unplug section we will enter an RCU critical section and
call callbacks at the exit.  The first issue is the notification batching
that this patch is trying to fix, the second is an excessive number of
thread fences on RCU exits every time virtio_notify_irqfd() and other
virtio functions are invoked.  The second issue can be avoided by using
RCU_READ_LOCK_GUARD() in completion functions.  Not sure if that will
improve performance, but it definitely removes a lot of noise from the
perf top for network backends.  This makes the code a bit less explicit
though, the lock guard will definitely need a comment.  Though, the reason
for blk_io_plug() calls is not fully clear for a module code alone either.

I'm not sure what is the best way forward.  I'm trying to figure out
that myself, since a similar solution will in the end be needed for
network backends and so it might be better to have something more generic.
What do you think?

> 
> Behavior is unchanged for emulated devices that do not use blk_io_plug()
> since blk_io_plug_call() immediately invokes the callback when called
> outside a blk_io_plug()/blk_io_unplug() region.
> 
> fio rw=randread bs=4k iodepth=64 numjobs=8 IOPS increases by ~9% with a
> single IOThread and 8 vCPUs. iodepth=1 decreases by ~1% but this could
> be noise. Detailed performance data and configuration specifics are
> available here:
> https://gitlab.com/stefanha/virt-playbooks/-/tree/blk_io_plug-irqfd
> 
> This duplicates the BH that virtio-blk uses for batching. The next
> commit will remove it.

I'm likely missing something, but could you explain why it is safe to
batch unconditionally here?  The current BH code, as you mentioned in
the second patch, is only batching if EVENT_IDX is not set.
Maybe worth adding a few words in the commit message for people like
me, who are a bit rusty on QEMU/virtio internals. :)

Best regards, Ilya Maximets.

> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  block/io_uring.c   |  6 ++++++
>  block/linux-aio.c  |  4 ++++
>  hw/virtio/virtio.c | 10 +++++++++-
>  3 files changed, 19 insertions(+), 1 deletion(-)
> 
> diff --git a/block/io_uring.c b/block/io_uring.c
> index 69d9820928..749cf83934 100644
> --- a/block/io_uring.c
> +++ b/block/io_uring.c
> @@ -124,6 +124,9 @@ static void luring_process_completions(LuringState *s)
>  {
>      struct io_uring_cqe *cqes;
>      int total_bytes;
> +
> +    blk_io_plug();
> +
>      /*
>       * Request completion callbacks can run the nested event loop.
>       * Schedule ourselves so the nested event loop will "see" remaining
> @@ -216,7 +219,10 @@ end:
>              aio_co_wake(luringcb->co);
>          }
>      }
> +
>      qemu_bh_cancel(s->completion_bh);
> +
> +    blk_io_unplug();
>  }
>  
>  static int ioq_submit(LuringState *s)
> diff --git a/block/linux-aio.c b/block/linux-aio.c
> index 561c71a9ae..cef3d6b1c7 100644
> --- a/block/linux-aio.c
> +++ b/block/linux-aio.c
> @@ -204,6 +204,8 @@ static void qemu_laio_process_completions(LinuxAioState 
> *s)
>  {
>      struct io_event *events;
>  
> +    blk_io_plug();
> +
>      /* Reschedule so nested event loops see currently pending completions */
>      qemu_bh_schedule(s->completion_bh);
>  
> @@ -230,6 +232,8 @@ static void qemu_laio_process_completions(LinuxAioState 
> *s)
>       * own `for` loop.  If we are the last all counters droped to zero. */
>      s->event_max = 0;
>      s->event_idx = 0;
> +
> +    blk_io_unplug();
>  }
>  
>  static void qemu_laio_process_completions_and_submit(LinuxAioState *s)
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 309038fd46..a691e8526b 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -28,6 +28,7 @@
>  #include "hw/virtio/virtio-bus.h"
>  #include "hw/qdev-properties.h"
>  #include "hw/virtio/virtio-access.h"
> +#include "sysemu/block-backend.h"
>  #include "sysemu/dma.h"
>  #include "sysemu/runstate.h"
>  #include "virtio-qmp.h"
> @@ -2426,6 +2427,13 @@ static bool virtio_should_notify(VirtIODevice *vdev, 
> VirtQueue *vq)
>      }
>  }
>  
> +/* Batch irqs while inside a blk_io_plug()/blk_io_unplug() section */
> +static void virtio_notify_irqfd_unplug_fn(void *opaque)
> +{
> +    EventNotifier *notifier = opaque;
> +    event_notifier_set(notifier);
> +}
> +
>  void virtio_notify_irqfd(VirtIODevice *vdev, VirtQueue *vq)
>  {
>      WITH_RCU_READ_LOCK_GUARD() {
> @@ -2452,7 +2460,7 @@ void virtio_notify_irqfd(VirtIODevice *vdev, VirtQueue 
> *vq)
>       * to an atomic operation.
>       */
>      virtio_set_isr(vq->vdev, 0x1);
> -    event_notifier_set(&vq->guest_notifier);
> +    blk_io_plug_call(virtio_notify_irqfd_unplug_fn, &vq->guest_notifier);
>  }
>  
>  static void virtio_irq(VirtQueue *vq)




reply via email to

[Prev in Thread] Current Thread [Next in Thread]