[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] fix bdrv_read/write_em and qemu_aio_flush
From: |
Andrea Arcangeli |
Subject: |
[Qemu-devel] fix bdrv_read/write_em and qemu_aio_flush |
Date: |
Thu, 28 May 2009 18:33:10 +0200 |
Hello,
the debug code in my ide_dma_cancel patch (not yet included upstream)
made us notice that when qemu_aio_flush returns, there are still
pending aio commands that waits to complete. Auditing the code I found
strange stuff like the fact qemu_aio_waits does nothing if there's an
unrelated (no aio related) bh executed. And I think I found the reason
of why there was still pending aio when qemu_aio_flush because
qemu_aio_wait does a lot more than wait, it can start aio, and if the
previous ->io_flush returned zero, the loop ends and ->io_flush isn't
repeated. The fact an unrelated bh can make qemu_aio_wait a noop seems
troublesome for all callers that aren't calling qemu_aio_wait in a
loop like qemu_aio_flush, so I preferred to change those callers to a
safer qemu_aio_flush in case the bh executed generates more pending
I/O. What you think about this patch against qemu git?
------------
Subject: fix bdrv_read/write_em and qemu_aio_flush
From: Andrea Arcangeli <address@hidden>
qemu_aio_wait() was used to start aio through qemu_bh_poll(), like in case of
qcow2 reads from holes. The bh is global. I can't see how it can be possibly
safe to make qemu_aio_wait a noop, if any unrelated bh was pending and had to
be executed.
In addition qemu_aio_wait by invoking the bh, could execute new aio callbacks
that would issue more aio commands during their completion handlers, breaking
the invariant that qemu_aio_poll returns only when all ->io_flush methods
returns 0 (which lead to a failure in my fix to ide_dma_cancel that has debug
code to catch that, code that isn't yet in qemu upstream).
->io_flush returns 0
qemu_aio_wait()
qemu_aio_bh() -> qcow_aio_read_bh -> ide_read_dma_cb ->
ide_dma_submit_check
break the loop despite ->io_flush wouldn't return 0
I also changed most aio_wait callers to call aio_flush instead, this will
ensure that even a read from hole submitted with bdrv_aio_readv will be
complete by the time aio_flush returns.
Signed-off-by: Andrea Arcangeli <address@hidden>
---
diff --git a/aio.c b/aio.c
index 11fbb6c..b304852 100644
--- a/aio.c
+++ b/aio.c
@@ -103,6 +103,12 @@ void qemu_aio_flush(void)
do {
ret = 0;
+ /*
+ * If there are pending emulated aio start them so
+ * flush will be able to return 1.
+ */
+ qemu_bh_poll();
+
LIST_FOREACH(node, &aio_handlers, node) {
ret |= node->io_flush(node->opaque);
}
@@ -115,9 +121,6 @@ void qemu_aio_wait(void)
{
int ret;
- if (qemu_bh_poll())
- return;
-
do {
AioHandler *node;
fd_set rdfds, wrfds;
diff --git a/block.c b/block.c
index c80d4b9..78088a9 100644
--- a/block.c
+++ b/block.c
@@ -1487,7 +1487,7 @@ static int bdrv_read_em(BlockDriverState *bs, int64_t
sector_num,
return -1;
while (async_ret == NOT_DONE) {
- qemu_aio_wait();
+ qemu_aio_flush();
}
return async_ret;
@@ -1510,7 +1510,7 @@ static int bdrv_write_em(BlockDriverState *bs, int64_t
sector_num,
if (acb == NULL)
return -1;
while (async_ret == NOT_DONE) {
- qemu_aio_wait();
+ qemu_aio_flush();
}
return async_ret;
}
diff --git a/qemu-io.c b/qemu-io.c
index f0a17b9..0d52074 100644
--- a/qemu-io.c
+++ b/qemu-io.c
@@ -153,7 +153,7 @@ static int do_aio_readv(QEMUIOVector *qiov, int64_t offset,
int *total)
return -EIO;
while (async_ret == NOT_DONE)
- qemu_aio_wait();
+ qemu_aio_flush();
*total = qiov->size;
return async_ret < 0 ? async_ret : 1;
@@ -170,7 +170,7 @@ static int do_aio_writev(QEMUIOVector *qiov, int64_t
offset, int *total)
return -EIO;
while (async_ret == NOT_DONE)
- qemu_aio_wait();
+ qemu_aio_flush();
*total = qiov->size;
return async_ret < 0 ? async_ret : 1;
- [Qemu-devel] fix bdrv_read/write_em and qemu_aio_flush,
Andrea Arcangeli <=