qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC patch 0/1] block: vhost-blk backend


From: Stefano Garzarella
Subject: Re: [RFC patch 0/1] block: vhost-blk backend
Date: Thu, 28 Jul 2022 17:40:02 +0200

On Thu, Jul 28, 2022 at 7:28 AM Andrey Zhadchenko 
<andrey.zhadchenko@virtuozzo.com> wrote:
> On 7/27/22 16:06, Stefano Garzarella wrote:
> > On Tue, Jul 26, 2022 at 04:15:48PM +0200, Denis V. Lunev wrote:
> >> On 26.07.2022 15:51, Michael S. Tsirkin wrote:
> >>> On Mon, Jul 25, 2022 at 11:55:26PM +0300, Andrey Zhadchenko wrote:
> >>>> Although QEMU virtio-blk is quite fast, there is still some room for
> >>>> improvements. Disk latency can be reduced if we handle virito-blk
> >>>> requests
> >>>> in host kernel so we avoid a lot of syscalls and context switches.
> >>>>
> >>>> The biggest disadvantage of this vhost-blk flavor is raw format.
> >>>> Luckily Kirill Thai proposed device mapper driver for QCOW2 format
> >>>> to attach
> >>>> files as block devices:
> >>>> https://www.spinics.net/lists/kernel/msg4292965.html
> >>> That one seems stalled. Do you plan to work on that too?
> >> We have too. The difference in numbers, as you seen below is quite too
> >> much. We have waited for this patch to be sent to keep pushing.
> >>
> >> It should be noted that may be talk on OSS this year could also push a
> >> bit.
> >
> > Cool, the results are similar of what I saw when I compared vhost-blk
> > and io_uring passthrough with NVMe (Slide 7 here: [1]).
> >
> > About QEMU block layer support, we recently started to work on libblkio
> > [2]. Stefan also sent an RFC [3] to implement the QEMU BlockDriver.
> > Currently it supports virtio-blk devices using vhost-vdpa and vhost-user.
> > We could add support for vhost (kernel) as well, though, we were
> > thinking of leveraging vDPA to implement in-kernel software device as well.
> >
> > That way we could reuse a lot of the code to support both hardware and
> > software accelerators.
> >
> > In the talk [1] I describe the idea a little bit, and a few months ago I
> > did a PoC (unsubmitted RFC) to see if it was feasible and the numbers
> > were in line with vhost-blk.
> >
> > Do you think we could join forces and just have an in-kernel vdpa-blk
> > software device?
>
> This seems worth trying. Why double the efforts to do the same. Yet I
> would like to play a bit with your vdpa-blk PoC beforehand.

Great :-)

> Can you send it to me with some instructions how to run it?

Yep, sure!

The PoC is available here: 
https://gitlab.com/sgarzarella/linux/-/tree/vdpa-sw-blk-poc

The tree was based on Linux v5.16, but I had some issues to rebuild with 
new gcc, so I rebased on v5.16.20 (not tested), configs needed:
CONFIG_VDPA_SW_BLOCK=m + CONFIG_VHOST_VDPA=m + dependencies.

It contains:
  - patches required for QEMU generic vhost-vdpa support
  - patches to support blk_mq_ops->poll() (to use io_uring iopoll) in
    the guest virtio-blk driver (I used the same kernel on guest and
    host)
  - some improvements for vringh (not completed, it could be a
    bottleneck)
  - vdpa-sw and vdpa-sw-blk patches (and hacks)

It is based on the vDPA simulator framework already merged upstream. The 
idea is to generalize the simulator to share the code between both 
software devices and simulators. The code needs a lot of work, I was 
focusing just on a working virtio-blk device emulation, but more focus 
on the generic part should be done.
In the code there are a couple of defines to control polling.

About the vdpa-blk device, you need iproute2's vdpa tool available 
upstream:
  https://wiki.linuxfoundation.org/networking/iproute2

Once the device is instantiated (see instructions later), the backend 
(raw file or device) can be set through a device attribute (not robust, 
but it was a PoC): /sys/bus/vdpa/devices/$dev_name/backend_fd

I wrote a simple python script available here: 
https://github.com/stefano-garzarella/vm-build/blob/main/vm-tools/vdpa_set_backend_fd.py

For QEMU, we are working on libblkio to support both slow path (when 
QEMU block layer is needed) and fast path (vqs passed directly to the 
device). For now libblkio supports only slow path, so to test the fast 
path you can use Longpeng's patches (not yet merged upstream) with 
generic vhost-vdpa support: 
https://lore.kernel.org/qemu-devel/20220514041107.1980-1-longpeng2@huawei.com/

Steps:
  # load vDPA block in-kernel sw device module
  modprobe vdpa_sw_blk

  # load nvme module with poll_queues set if you want to use iopoll
  modprobe nvme poll_queues=15

  # instantiate a new vdpa-blk device
  vdpa dev add mgmtdev vdpasw_blk name blk0

  # set backend (/dev/nvme0n1)
  vdpa_set_backend_fd.py -b /dev/nvme0n1 blk0

  # load vhost vDPA bus ...
  modprobe vhost_vdpa

  # ... and vhost-vdpa device will appear
  ls -l /dev/vhost-vdpa-0
  crw-------. 1 root root 510, 0 Jul 28 17:06 /dev/vhost-vdpa-0

  # start QEMU patched with generic vhost-vdpa
  qemu-system-x86_64 ... \
  -device vhost-vdpa-device-pci,vhostdev=/dev/vhost-vdpa-0

I haven't tested it recently, so I'm not sure it all works, but in the 
next few days I'll try. For anything else, feel free to reach me here or 
on IRC (sgarzare on #qemu).

Thanks,
Stefano




reply via email to

[Prev in Thread] Current Thread [Next in Thread]