[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] [RFC v1] Introduce a new NVMe host device type to QEMU
From: |
Changpeng Liu |
Subject: |
[Qemu-devel] [RFC v1] Introduce a new NVMe host device type to QEMU |
Date: |
Mon, 15 Jan 2018 16:01:54 +0800 |
NVMe 1.3 specification(http://nvmexpress.org/resources/specifications/)
introduced a new Admin command:
Doorbell Buffer Config, which designed for emulated NVMe controllers only,
Linux kernel 4.12 added the
support of Doorbell Buffer Config. With this feature, when NVMe driver issues
new requests to firmware,
the driver will write shadow doorbell instead of MMIO writes, so the NVMe
specification itself can
become a great Para-virtualization protocol.
While here, similar with existing vhost-user-scsi idea, we can setup a slave
I/O target which can serve
Guest I/Os directly via NVMe I/O queues. Here we can route the NVMe queue's
information, such as queue
size/queue address etc. to a separate slave I/O target via UNIX domain socket.
I took exist QEMU
vhost-user protocol as reference, designed several totally new socket messages
to enable the function.
With this idea, an emulated virtual NVMe controller will be presented at the
Guest, and native NVMe
driver inside Guest can be used.
-----------------------------------------------------------------------------------------------------------------------------------------
| Unix Domain Socket Messages | Description
|
-----------------------------------------------------------------------------------------------------------------------------------------
| Get Controller Capabilities | Controller capabilitiy register of
NVMe specification |
-----------------------------------------------------------------------------------------------------------------------------------------
| Get/Set Controller Configuration | Enable/Disable NVMe controller
|
-----------------------------------------------------------------------------------------------------------------------------------------
| Admin passthrough | Mandatory NVMe Admin commands
routed to slave I/O target |
-----------------------------------------------------------------------------------------------------------------------------------------
| IO passthrough | IO messages before the shadow
doorbell buffer being configured |
-----------------------------------------------------------------------------------------------------------------------------------------
| Set memory table | Same with exist vhost-user
message, used for memory translation |
-----------------------------------------------------------------------------------------------------------------------------------------
| Set Guest Notifier | Completion queue interrupt,
interrupt Guest when I/O completed |
-----------------------------------------------------------------------------------------------------------------------------------------
With those messages, slave I/O target can access all the I/O queues of NVMe
include submission queue and
completion queue. After finished the Admin Shadow Doorbell command, the slave
I/O target can start to
process the I/O requests sent from Guest.
Currently I implemented both QEMU driver and slave I/O target which largely
reused the code from QEMU
NVMe driver and vhost-user driver for performance evaluation:
Optional slave I/O target(SPDK Vhost Target) patches:
https://review.gerrithub.io/#/c/384213/
User space NVMe driver is implemented at the slave I/O target so that NVMe
controller can be shared
with multiple VMs, and the namespaces presented to the guest VM are virtual
namespaces, meaning the
slave I/O target can back these namespaces with any kind of storage. Guest OS
must be 4.12 or later(with
Admin Doorbell Buffer Config support), tests from my side used Fedora 27 with
4.13 kernel.
Currently this still is an ongoing work, there are some opens need to be
addressed:
-Reused a lot of code from QEMU/nvme driver, need to think about abstracting a
common NVMe library;
-Reused a lot of code from QEMU/vhost-user driver, for this idea, we just want
to use UNIX domain
socket to deliver mandatory messages, of course Set memory table and Set guest
notifier is exactly
same with vhost-user driver;
-Can support Guest OS kernel > 4.12 with Admin Doorbell Buffer feature enabled
inside Guest, for BIOS
stage IO requests and older Linux kernel without Admin Doorbell Buffer
support, it can forward the IO
requests through socket message, but this will have huge performance drop;
Any feedback is appreciated.
Changpeng Liu (1):
block/NVMe: introduce a new vhost NVMe host device to QEMU
hw/block/Makefile.objs | 3 +
hw/block/nvme.h | 28 ++
hw/block/vhost.c | 439 ++++++++++++++++++++++
hw/block/vhost_user.c | 588 +++++++++++++++++++++++++++++
hw/block/vhost_user_nvme.c | 902 +++++++++++++++++++++++++++++++++++++++++++++
hw/block/vhost_user_nvme.h | 38 ++
6 files changed, 1998 insertions(+)
create mode 100644 hw/block/vhost.c
create mode 100644 hw/block/vhost_user.c
create mode 100644 hw/block/vhost_user_nvme.c
create mode 100644 hw/block/vhost_user_nvme.h
--
1.9.3
- [Qemu-devel] [RFC v1] Introduce a new NVMe host device type to QEMU,
Changpeng Liu <=