[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] [RFC PATCH 00/40] Sneak peek of virtio and dataplane change
From: |
Paolo Bonzini |
Subject: |
[Qemu-devel] [RFC PATCH 00/40] Sneak peek of virtio and dataplane changes for 2.6 |
Date: |
Tue, 24 Nov 2015 19:00:51 +0100 |
This large series is basically all that I would like to get into 2.6.
It is a combination of several pieces of work on dataplane and
multithreaded block layer.
It's also a large part of why I would like someone else to look at
miscellaneous patches for a while (in case you've missed that). I
can foresee that following the reviews is going to be a huge time drain.
With it I can get ~1300 Kiops on 8 disks (which I achieve with 2 iothreads
and 5 VCPUs). The bulk of the improvement actually comes from the first
8 patches, but the rest of the series is what prepares for what's next
to come in QEMU 2.7 and later, such as a multiqueue block layer.
It's tedious to review, with some pretty large patches (3, 32, 33, 35).
That's how you attract reviewers, isn't it? I would like to get the
first virtio and the first block layer part in very soon after 2.6
development starts.
I've split it in four parts, the first two touching virtio mostly,
while the last two are for the block layer.
Because it's large, I've CCed people only on the cover letter.
This work is available at github.com/bonzini/qemu.git, branch dataplane.
A. "LEAN" VIRTQUEUEELEMENT
--------------------------
Patches 1 to 8 modify VirtQueueElement so that the space for
scatter/gather lists is allocated dynamically rather than being
fixed to 4K. VirtQueueElement becomes a sort of "superclass", and
the scatter/gather elements are placed in the same malloc block,
which is laid out like
VirtQueueElement
other fields ("subclass" fields)
in_addr[]
out_addr[]
in_sg[]
out_sg[]
This can provide a large speedup (from 1.3x to 2.3x) with many disks,
due to the 48K sized VirtQueueElement. All virtio devices have to
be changed (patch 3). I chose to do it all in a single patch because
the changes are anyway well isolated between each device.
The main issue here is that VirtQueueElement was haphazardly shoveled
straight in the migration stream (in host endianness). :( Patch 5
straightens this out, but at the cost of breaking backwards migration
because it now writes the VirtQueueElement in big endian, consistent
with other migration streams.
This is the least tested part of the series. I nevertheless put it
first because it's the one that is more complicated to rebase, and
I want to get rid of it as fast as possible. Reviewing the general
approach is welcome anyway.
Status: virtio-input, virtio-gpu and migration not tested at all
B. REMOVING VRING.C
-------------------
This is patches 9 to 16. It removes the duplicate dataplane-specific
implementation of virtio in favor of the regular one that is already
used for non-dataplane. While the dataplane implementation is slightly
more optimized, I chose to keep the other one to avoid another "touch
all virtio devices" series.
Patch 10 alone mostly brings performance in par between the two.
The remaining 7-8% can be recovered by mostly getting rid of tiny
address_space_* operations, keeping the rings always mapped. Note that
the rest of this big series does bring a little performance improvement,
and already makes up for the lost performance.
This part has a dependency on patches that are not part of this series
(and do not exist yet), which make it possible to write the dirty
bitmap outside the BQL. The dirty bitmap is not yet thread-safe because,
while it is read and written with atomic operations, it may be resized
when there is a memory hotplug operation. There are plans to fix this
using RCU.
Nevertheless, this doesn't block part C.
Status: ready, but depends on the missing dirty bitmap support
C. FINE-GRAINED AIO_POLL CRITICAL SECTIONS
------------------------------------------
This is patch 17 to 28. It starts pushing aio_context_acquire down
into aio_poll. This part is more or less independent from A and B,
and it ends with aio_poll calling aio_context_acquire/release around
every callback.
To do this, this part introduces a thread-safe variant of the common
"walking_xxx++/walking_xxx--" idiom already found in several places
in aio*.c and async.c.
Status: ready, except that I haven't tested quorum enough
D. FINE-GRAINED BLOCK LAYER CRITICAL SECTIONS
---------------------------------------------
This is patch 29 to 40. It explicitly acquires the AioContext in all
callbacks that need it (file descriptors, bottom halves, timers, AIO)
rather than in aio_poll. This is the first step towards breaking
AioContext in many small locks, and hence the last prerequisite for
a real multiqueue QEMU block layer.
This has the biggest patches and, unlike patch 3, they are very hard
to split further.
At the end, starting with patch 37, a few patches do some small
optimization on aio_poll that is now possible, and the last one makes
virtio-scsi dataplane _almost_ thread-safe.
Status: ready
If you've read so far and didn't get bored, you're more than qualified
as a reviewer. :)
Paolo
Paolo Bonzini (40):
9pfs: allocate pdus with g_malloc/g_free
virtio: move VirtQueueElement at the beginning of the structs
virtio: move allocation to virtqueue_pop/vring_pop
virtio: introduce qemu_get/put_virtqueue_element
virtio: read/write the VirtQueueElement a field at a time
virtio: introduce virtqueue_alloc_element
virtio: slim down allocation of VirtQueueElements
vring: slim down allocation of VirtQueueElements
vring: make vring_enable_notification return void
virtio: combine the read of a descriptor
virtio: add AioContext-specific function for host notifiers
virtio: export vring_notify as virtio_should_notify
virtio-blk: fix "disabled data plane" mode
virtio-blk: do not use vring in dataplane
virtio-scsi: do not use vring in dataplane
vring: remove
iothread: release AioContext around aio_poll
qemu-thread: introduce QemuRecMutex
aio: convert from RFifoLock to QemuRecMutex
aio: rename bh_lock to list_lock
qemu-thread: introduce QemuLockCnt
aio: make ctx->list_lock a QemuLockCnt, subsuming ctx->walking_bh
qemu-thread: optimize QemuLockCnt with futexes on Linux
aio: tweak walking in dispatch phase
aio-posix: remove walking_handlers, protecting AioHandler list with list_lock
aio-win32: remove walking_handlers, protecting AioHandler list with list_lock
aio: document locking
aio: push aio_context_acquire/release down to dispatching
quorum: use atomics for rewrite_count
quorum: split quorum_fifo_aio_cb from quorum_aio_cb
qed: introduce qed_aio_start_io and qed_aio_next_io_cb
block: explicitly acquire aiocontext in callbacks that need it
block: explicitly acquire aiocontext in bottom halves that need it
block: explicitly acquire aiocontext in timers that need it
block: explicitly acquire aiocontext in aio callbacks that need it
aio: update locking documentation
async: optimize aio_bh_poll
aio-posix: partially inline aio_dispatch into aio_poll
async: remove unnecessary inc/dec pairs
dma-helpers: avoid lock inversion with AioContext
aio-posix.c | 108 +++---
aio-win32.c | 111 +++---
async.c | 76 ++--
block/blkverify.c | 6 +-
block/curl.c | 43 ++-
block/gluster.c | 2 +
block/io.c | 7 +
block/iscsi.c | 10 +
block/linux-aio.c | 14 +-
block/mirror.c | 12 +-
block/nbd-client.c | 14 +-
block/nfs.c | 10 +
block/qed-cluster.c | 2 +
block/qed-table.c | 12 +-
block/qed.c | 112 ++++--
block/qed.h | 3 +
block/quorum.c | 60 +--
block/sheepdog.c | 29 +-
block/ssh.c | 47 ++-
block/throttle-groups.c | 2 +
block/win32-aio.c | 8 +-
dma-helpers.c | 27 +-
docs/lockcnt.txt | 342 +++++++++++++++++
docs/multiple-iothreads.txt | 95 ++++-
hw/9pfs/virtio-9p-device.c | 7 +-
hw/9pfs/virtio-9p.c | 25 +-
hw/9pfs/virtio-9p.h | 4 +-
hw/block/dataplane/virtio-blk.c | 131 +------
hw/block/dataplane/virtio-blk.h | 1 +
hw/block/virtio-blk.c | 92 ++---
hw/char/virtio-serial-bus.c | 78 ++--
hw/display/virtio-gpu.c | 25 +-
hw/input/virtio-input.c | 24 +-
hw/net/virtio-net.c | 69 ++--
hw/scsi/scsi-bus.c | 2 +
hw/scsi/scsi-disk.c | 18 +
hw/scsi/scsi-generic.c | 20 +-
hw/scsi/virtio-scsi-dataplane.c | 197 ++--------
hw/scsi/virtio-scsi.c | 82 ++--
hw/virtio/Makefile.objs | 1 -
hw/virtio/dataplane/Makefile.objs | 1 -
hw/virtio/dataplane/vring.c | 526 --------------------------
hw/virtio/virtio-balloon.c | 22 +-
hw/virtio/virtio-rng.c | 10 +-
hw/virtio/virtio.c | 323 +++++++++++-----
include/block/aio.h | 38 +-
include/hw/virtio/dataplane/vring-accessors.h | 75 ----
include/hw/virtio/dataplane/vring.h | 51 ---
include/hw/virtio/virtio-balloon.h | 2 +-
include/hw/virtio/virtio-blk.h | 9 +-
include/hw/virtio/virtio-net.h | 2 +-
include/hw/virtio/virtio-scsi.h | 36 +-
include/hw/virtio/virtio-serial.h | 2 +-
include/hw/virtio/virtio.h | 16 +-
include/qemu/futex.h | 36 ++
include/qemu/rfifolock.h | 54 ---
include/qemu/thread-posix.h | 6 +
include/qemu/thread-win32.h | 10 +
include/qemu/thread.h | 23 ++
iothread.c | 11 +-
nbd.c | 4 +
tests/.gitignore | 1 -
tests/Makefile | 2 -
tests/test-aio.c | 19 +-
tests/test-rfifolock.c | 91 -----
thread-pool.c | 14 +-
trace-events | 13 +-
util/Makefile.objs | 2 +-
util/lockcnt.c | 404 ++++++++++++++++++++
util/qemu-coroutine-sleep.c | 5 +
util/qemu-thread-posix.c | 38 +-
util/qemu-thread-win32.c | 25 ++
util/rfifolock.c | 78 ----
73 files changed, 2000 insertions(+), 1877 deletions(-)
create mode 100644 docs/lockcnt.txt
delete mode 100644 hw/virtio/dataplane/Makefile.objs
delete mode 100644 hw/virtio/dataplane/vring.c
delete mode 100644 include/hw/virtio/dataplane/vring-accessors.h
delete mode 100644 include/hw/virtio/dataplane/vring.h
create mode 100644 include/qemu/futex.h
delete mode 100644 include/qemu/rfifolock.h
delete mode 100644 tests/test-rfifolock.c
create mode 100644 util/lockcnt.c
delete mode 100644 util/rfifolock.c
--
1.8.3.1
- [Qemu-devel] [RFC PATCH 00/40] Sneak peek of virtio and dataplane changes for 2.6,
Paolo Bonzini <=
- [Qemu-devel] [PATCH 01/40] 9pfs: allocate pdus with g_malloc/g_free, Paolo Bonzini, 2015/11/24
- [Qemu-devel] [PATCH 02/40] virtio: move VirtQueueElement at the beginning of the structs, Paolo Bonzini, 2015/11/24
- [Qemu-devel] [PATCH 06/40] virtio: introduce virtqueue_alloc_element, Paolo Bonzini, 2015/11/24
- [Qemu-devel] [PATCH 03/40] virtio: move allocation to virtqueue_pop/vring_pop, Paolo Bonzini, 2015/11/24
- [Qemu-devel] [PATCH 04/40] virtio: introduce qemu_get/put_virtqueue_element, Paolo Bonzini, 2015/11/24
- [Qemu-devel] [PATCH 07/40] virtio: slim down allocation of VirtQueueElements, Paolo Bonzini, 2015/11/24