Re: [RFC patch 0/1] block: vhost-blk backend

qemu-block

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC patch 0/1] block: vhost-blk backend

From:	Andrey Zhadchenko
Subject:	Re: [RFC patch 0/1] block: vhost-blk backend
Date:	Wed, 5 Oct 2022 14:50:06 +0300
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.1.0



On 10/4/22 22:00, Stefan Hajnoczi wrote:

On Mon, Jul 25, 2022 at 11:55:26PM +0300, Andrey Zhadchenko wrote:

Although QEMU virtio-blk is quite fast, there is still some room for
improvements. Disk latency can be reduced if we handle virito-blk requests
in host kernel so we avoid a lot of syscalls and context switches.

The biggest disadvantage of this vhost-blk flavor is raw format.
Luckily Kirill Thai proposed device mapper driver for QCOW2 format to attach
files as block devices: https://www.spinics.net/lists/kernel/msg4292965.html

Also by using kernel modules we can bypass iothread limitation and finaly scale
block requests with cpus for high-performance devices. This is planned to be
implemented in next version.

Linux kernel module part:
https://lore.kernel.org/kvm/20220725202753.298725-1-andrey.zhadchenko@virtuozzo.com/

test setups and results:
fio --direct=1 --rw=randread  --bs=4k  --ioengine=libaio --iodepth=128
QEMU drive options: cache=none
filesystem: xfs

SSD:
                | randread, IOPS  | randwrite, IOPS |
Host           |      95.8k      |      85.3k      |
QEMU virtio    |      57.5k      |      79.4k      |
QEMU vhost-blk |      95.6k      |      84.3k      |

RAMDISK (vq == vcpu):
                  | randread, IOPS | randwrite, IOPS |
virtio, 1vcpu    |      123k      |      129k       |
virtio, 2vcpu    |      253k (??) |      250k (??)  |
virtio, 4vcpu    |      158k      |      154k       |
vhost-blk, 1vcpu |      110k      |      113k       |
vhost-blk, 2vcpu |      247k      |      252k       |
vhost-blk, 4vcpu |      576k      |      567k       |

Andrey Zhadchenko (1):
   block: add vhost-blk backend

  configure                     |  13 ++
  hw/block/Kconfig              |   5 +
  hw/block/meson.build          |   1 +
  hw/block/vhost-blk.c          | 395 ++++++++++++++++++++++++++++++++++
  hw/virtio/meson.build         |   1 +
  hw/virtio/vhost-blk-pci.c     | 102 +++++++++
  include/hw/virtio/vhost-blk.h |  44 ++++
  linux-headers/linux/vhost.h   |   3 +
  8 files changed, 564 insertions(+)
  create mode 100644 hw/block/vhost-blk.c
  create mode 100644 hw/virtio/vhost-blk-pci.c
  create mode 100644 include/hw/virtio/vhost-blk.h


vhost-blk has been tried several times in the past. That doesn't mean it
cannot be merged this time, but past arguments should be addressed:

- What makes it necessary to move the code into the kernel? In the past
   the performance results were not very convincing. The fastest
   implementations actually tend to be userspace NVMe PCI drivers that
   bypass the kernel! Bypassing the VFS and submitting block requests
   directly was not a huge boost. The syscall/context switch argument
   sounds okay but the numbers didn't really show that kernel block I/O
   is much faster than userspace block I/O.

   I've asked for more details on the QEMU command-line to understand
   what your numbers show. Maybe something has changed since previous
   times when vhost-blk has been tried.

   The only argument I see is QEMU's current 1 IOThread per virtio-blk
   device limitation, which is currently being worked on. If that's the
   only reason for vhost-blk then is it worth doing all the work of
   getting vhost-blk shipped (kernel, QEMU, and libvirt changes)? It
   seems like a short-term solution.

- The security impact of bugs in kernel vhost-blk code is more serious
   than bugs in a QEMU userspace process.

- The management stack needs to be changed to use vhost-blk whereas
   QEMU can be optimized without affecting other layers.

Stefan

Indeed there was several vhost-blk attempts, but from what I found inmailing lists only the Asias attempt got some attention and discussion.Ramdisk performance results were great but ramdisk is more a benchmarkthan a real use. I didn't find out why Asias dropped his version exceptvague "He concluded performance results was not worth". The storagespeed is very important for vhost-blk performance, as there is no pointto cut cpu costs from 1ms to 0,1ms if the request need 50ms to proceedin the actual disk. I think that 10 years ago NVMI was non-existent andSSD + SATA was probably a lot faster than HDD but still not enough toutilize this technology.

The tests I did give me 60k IOPS randwrite for VM and 95k for host. Andthe vhost-blk is able to negate the difference even using only 1thread/vq/vcpu. And unlinke current QEMU single IOThread it can beeasily scaled with number of cpus/vcpus. For sure this can be solved byliftimg IOThread limitations but this will probably be even moredisastrous amount of changes (and adding vhost-blk won't break old setups!).

Probably the only undisputed advantage of vhost-blk is syscallsreduction. And again the benefit really depends on a storage speed, asit should be somehow comparable with syscalls time. Also I must notethat this may be good for high-density servers with a lot of VMs. Butfor now I did not have the exact numbers which show how much time we arereally winning for a single request at average.

Overall vhost-blk will only become better along with the increase ofstorage speed.

Also I must note that all arguments above apply to vdpa-blk. And unlikevhost-blk, which needs it's own QEMU code, vdpa-blk can be setup withgeneric virtio-vdpa QEMU code (I am not sure if it is merged yet butstill). Although vdpa-blk have it's own problems for now.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [RFC patch 0/1] block: vhost-blk backend, Stefan Hajnoczi, 2022/10/04
- Re: [RFC patch 0/1] block: vhost-blk backend, Andrey Zhadchenko, 2022/10/05
  - Re: [RFC patch 0/1] block: vhost-blk backend, Stefan Hajnoczi, 2022/10/05
- Re: [RFC patch 0/1] block: vhost-blk backend, Stefan Hajnoczi, 2022/10/04
  - Re: [RFC patch 0/1] block: vhost-blk backend, Andrey Zhadchenko, 2022/10/05
    - Re: [RFC patch 0/1] block: vhost-blk backend, Stefan Hajnoczi, 2022/10/05
- Re: [RFC patch 0/1] block: vhost-blk backend, Stefan Hajnoczi, 2022/10/04
  - Re: [RFC patch 0/1] block: vhost-blk backend, Andrey Zhadchenko <=
    - Re: [RFC patch 0/1] block: vhost-blk backend, Stefan Hajnoczi, 2022/10/05

Prev by Date: Re: [PATCH 13/26] parallels: add missing coroutine_fn annotations
Next by Date: Re: [RFC PATCH 1/1] block: add vhost-blk backend
Previous by thread: Re: [RFC patch 0/1] block: vhost-blk backend
Next by thread: Re: [RFC patch 0/1] block: vhost-blk backend
Index(es):
- Date
- Thread