qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] Question about QEMU's threading model and stacking multiple


From: Adrian Suarez
Subject: [Qemu-devel] Question about QEMU's threading model and stacking multiple block drivers
Date: Tue, 7 Feb 2017 18:38:00 -0800

We’ve implemented a block driver that exposes storage to QEMU VMs. Our
block driver (O) is interposing on writes to some other type of storage
(B). O performs low latency replication and then asynchronously issues the
write to the backing block driver, B, using bdrv_aio_writev(). Our problem
is that the write latencies seen by the workload in the guest should be
those imposed by O plus the guest I/O and QEMU stack (around 25us total
based on our measurements), but we’re actually seeing much higher latencies
(around 120us). We suspect that this is due to the backing block driver B’s
coroutines blocking our coroutines. The sequence of events is as follows
(see diagram:
https://docs.google.com/drawings/d/12h1QbecvxzlKxSFvGKYAzvAJ18kTW6AVTwDR6VA8hkw/pub?w=576&h=565
):

1. Write is issued to our block driver O using the asynchronous interface
for QEMU block driver.
2. Write is replicated to a fast device asynchronously.
2.a. In a different thread, the fast device invokes a callback on
completion that causes a coroutine to be scheduled to run in the QEMU
iothread that acknowledges completion of the write to the guest OS.
2.b. The coroutine scheduled in (2.a) is executed.
3. Write is issued asynchronously to the backing block driver, B.
3.a. The backing block driver, B, invokes the completion function supplied
by us, which frees any memory associated with the write (e.g. copies of IO
vectors).

Steps (1), (2), and (3) are performed in the same coroutine (our driver's
bdrv_aio_writev() implementation). (2.a) is executed in a thread that is
part of our transport library linked by O, and (2.b) and (3.a) are executed
as coroutines in the QEMU iothread.

We've tried improving the performance by using separate iothreads for the
two devices, but this only shaved about lowered the latency to around 100us
and caused stability issues. What's the best way to create a separate
iothread for the backing driver to do all of its work in?

-Adrian


reply via email to

[Prev in Thread] Current Thread [Next in Thread]