[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [RFC patch 0/1] block: vhost-blk backend
From: |
Stefan Hajnoczi |
Subject: |
Re: [RFC patch 0/1] block: vhost-blk backend |
Date: |
Wed, 5 Oct 2022 11:40:17 -0400 |
On Wed, Oct 05, 2022 at 02:50:06PM +0300, Andrey Zhadchenko wrote:
>
>
> On 10/4/22 22:00, Stefan Hajnoczi wrote:
> > On Mon, Jul 25, 2022 at 11:55:26PM +0300, Andrey Zhadchenko wrote:
> > > Although QEMU virtio-blk is quite fast, there is still some room for
> > > improvements. Disk latency can be reduced if we handle virito-blk requests
> > > in host kernel so we avoid a lot of syscalls and context switches.
> > >
> > > The biggest disadvantage of this vhost-blk flavor is raw format.
> > > Luckily Kirill Thai proposed device mapper driver for QCOW2 format to
> > > attach
> > > files as block devices:
> > > https://www.spinics.net/lists/kernel/msg4292965.html
> > >
> > > Also by using kernel modules we can bypass iothread limitation and finaly
> > > scale
> > > block requests with cpus for high-performance devices. This is planned to
> > > be
> > > implemented in next version.
> > >
> > > Linux kernel module part:
> > > https://lore.kernel.org/kvm/20220725202753.298725-1-andrey.zhadchenko@virtuozzo.com/
> > >
> > > test setups and results:
> > > fio --direct=1 --rw=randread --bs=4k --ioengine=libaio --iodepth=128
> > > QEMU drive options: cache=none
> > > filesystem: xfs
> > >
> > > SSD:
> > > | randread, IOPS | randwrite, IOPS |
> > > Host | 95.8k | 85.3k |
> > > QEMU virtio | 57.5k | 79.4k |
> > > QEMU vhost-blk | 95.6k | 84.3k |
> > >
> > > RAMDISK (vq == vcpu):
> > > | randread, IOPS | randwrite, IOPS |
> > > virtio, 1vcpu | 123k | 129k |
> > > virtio, 2vcpu | 253k (??) | 250k (??) |
> > > virtio, 4vcpu | 158k | 154k |
> > > vhost-blk, 1vcpu | 110k | 113k |
> > > vhost-blk, 2vcpu | 247k | 252k |
> > > vhost-blk, 4vcpu | 576k | 567k |
> > >
> > > Andrey Zhadchenko (1):
> > > block: add vhost-blk backend
> > >
> > > configure | 13 ++
> > > hw/block/Kconfig | 5 +
> > > hw/block/meson.build | 1 +
> > > hw/block/vhost-blk.c | 395 ++++++++++++++++++++++++++++++++++
> > > hw/virtio/meson.build | 1 +
> > > hw/virtio/vhost-blk-pci.c | 102 +++++++++
> > > include/hw/virtio/vhost-blk.h | 44 ++++
> > > linux-headers/linux/vhost.h | 3 +
> > > 8 files changed, 564 insertions(+)
> > > create mode 100644 hw/block/vhost-blk.c
> > > create mode 100644 hw/virtio/vhost-blk-pci.c
> > > create mode 100644 include/hw/virtio/vhost-blk.h
> >
> > vhost-blk has been tried several times in the past. That doesn't mean it
> > cannot be merged this time, but past arguments should be addressed:
> >
> > - What makes it necessary to move the code into the kernel? In the past
> > the performance results were not very convincing. The fastest
> > implementations actually tend to be userspace NVMe PCI drivers that
> > bypass the kernel! Bypassing the VFS and submitting block requests
> > directly was not a huge boost. The syscall/context switch argument
> > sounds okay but the numbers didn't really show that kernel block I/O
> > is much faster than userspace block I/O.
> >
> > I've asked for more details on the QEMU command-line to understand
> > what your numbers show. Maybe something has changed since previous
> > times when vhost-blk has been tried.
> >
> > The only argument I see is QEMU's current 1 IOThread per virtio-blk
> > device limitation, which is currently being worked on. If that's the
> > only reason for vhost-blk then is it worth doing all the work of
> > getting vhost-blk shipped (kernel, QEMU, and libvirt changes)? It
> > seems like a short-term solution.
> >
> > - The security impact of bugs in kernel vhost-blk code is more serious
> > than bugs in a QEMU userspace process.
> >
> > - The management stack needs to be changed to use vhost-blk whereas
> > QEMU can be optimized without affecting other layers.
> >
> > Stefan
>
> Indeed there was several vhost-blk attempts, but from what I found in
> mailing lists only the Asias attempt got some attention and discussion.
> Ramdisk performance results were great but ramdisk is more a benchmark than
> a real use. I didn't find out why Asias dropped his version except vague "He
> concluded performance results was not worth". The storage speed is very
> important for vhost-blk performance, as there is no point to cut cpu costs
> from 1ms to 0,1ms if the request need 50ms to proceed in the actual disk. I
> think that 10 years ago NVMI was non-existent and SSD + SATA was probably a
> lot faster than HDD but still not enough to utilize this technology.
Yes, it's possible that latency improvements are more noticeable now.
Thank you for posting the benchmark results. I will also run benchmarks
so we can compare vhost-blk with today's QEMU as well as multiqueue
IOThreads QEMU (for which I only have a hacky prototype) on a local NVMe
PCI SSD.
> The tests I did give me 60k IOPS randwrite for VM and 95k for host. And the
> vhost-blk is able to negate the difference even using only 1 thread/vq/vcpu.
> And unlinke current QEMU single IOThread it can be easily scaled with number
> of cpus/vcpus. For sure this can be solved by liftimg IOThread limitations
> but this will probably be even more disastrous amount of changes (and adding
> vhost-blk won't break old setups!).
>
> Probably the only undisputed advantage of vhost-blk is syscalls reduction.
> And again the benefit really depends on a storage speed, as it should be
> somehow comparable with syscalls time. Also I must note that this may be
> good for high-density servers with a lot of VMs. But for now I did not have
> the exact numbers which show how much time we are really winning for a
> single request at average.
>
> Overall vhost-blk will only become better along with the increase of storage
> speed.
>
> Also I must note that all arguments above apply to vdpa-blk. And unlike
> vhost-blk, which needs it's own QEMU code, vdpa-blk can be setup with
> generic virtio-vdpa QEMU code (I am not sure if it is merged yet but still).
> Although vdpa-blk have it's own problems for now.
Yes, I think that's why Stefano hasn't pushed for a software vpda-blk
device yet despite having played with it and is more focussed on
hardware enablement. vdpa-blk has the same issues as vhost-blk.
Stefan
signature.asc
Description: PGP signature