qemu-stable
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-stable] [PATCH v2 1/1] qemu-img: wait for convert coroutines t


From: Anton Nefedov
Subject: Re: [Qemu-stable] [PATCH v2 1/1] qemu-img: wait for convert coroutines to complete
Date: Mon, 24 Apr 2017 23:13:05 +0300
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:45.0) Gecko/20100101 Thunderbird/45.8.0


On 24/04/2017 21:16, Peter Lieven wrote:


Am 24.04.2017 um 18:27 schrieb Anton Nefedov <address@hidden>:

On 04/21/2017 03:37 PM, Peter Lieven wrote:
Am 21.04.2017 um 14:19 schrieb Anton Nefedov:
On 04/21/2017 01:44 PM, Peter Lieven wrote:
Am 21.04.2017 um 12:04 schrieb Anton Nefedov:
On error path (like i/o error in one of the coroutines), it's required to
 - wait for coroutines completion before cleaning the common structures
 - reenter dependent coroutines so they ever finish

Introduced in 2d9187bc65.

Signed-off-by: Anton Nefedov <address@hidden>
---
[..]



And what if we error out in the read path? Wouldn't be something like this 
easier?


diff --git a/qemu-img.c b/qemu-img.c
index 22f559a..4ff1085 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1903,6 +1903,16 @@ static int convert_do_copy(ImgConvertState *s)
        main_loop_wait(false);
    }

+    /* on error path we need to enter all coroutines that are still
+     * running before cleaning up common structures */
+    if (s->ret) {
+        for (i = 0; i < s->num_coroutines; i++) {
+             if (s->co[i]) {
+                 qemu_coroutine_enter(s->co[i]);
+             }
+        }
+    }
+
    if (s->compressed && !s->ret) {
        /* signal EOF to align */
        ret = blk_pwrite_compressed(s->target, 0, NULL, 0);


Peter


seemed a bit too daring to me to re-enter every coroutine potentially including 
the ones that yielded waiting for I/O completion.
If that's ok - that is for sure easier :)

I think we should enter every coroutine that is still running and have it 
terminate correctly. It was my mistake that I have not
done this in the original patch. Can you check if the above fixes your test 
cases that triggered the bug?


hi, sorry I'm late with the answer

this segfaults in bdrv_close(). Looks like it tries to finish some i/o which 
coroutine we have already entered and terminated?

(gdb) run
Starting program: /vz/anefedov/qemu-build/us/./qemu-img convert -O qcow2 
./harddisk.hdd.c ./harddisk.hdd
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7fffeac2d700 (LWP 436020)]
[New Thread 0x7fffe4ed6700 (LWP 436021)]
qemu-img: error while reading sector 20480: Input/output error
qemu-img: error while writing sector 19712: Operation now in progress

Program received signal SIGSEGV, Segmentation fault.
aio_co_wake (co=0x0) at /mnt/code/us-qemu/util/async.c:454
454         ctx = atomic_read(&co->ctx);
(gdb) bt
#0  aio_co_wake (co=0x0) at /mnt/code/us-qemu/util/async.c:454
/* [Anton]: thread_pool_co_cb () here */
#1  0x0000555555634629 in thread_pool_completion_bh (opaque=0x555555cfe020) at 
/mnt/code/us-qemu/util/thread-pool.c:189
#2  0x0000555555633b31 in aio_bh_call (bh=0x555555cfe0f0) at 
/mnt/code/us-qemu/util/async.c:90
#3  aio_bh_poll (address@hidden) at /mnt/code/us-qemu/util/async.c:118
#4  0x0000555555636f14 in aio_poll (address@hidden, blocking=<optimized out>) 
at /mnt/code/us-qemu/util/aio-posix.c:682
#5  0x00005555555c52d4 in bdrv_drain_recurse (address@hidden) at 
/mnt/code/us-qemu/block/io.c:164
#6  0x00005555555c5aed in bdrv_drained_begin (address@hidden) at 
/mnt/code/us-qemu/block/io.c:248
#7  0x0000555555581443 in bdrv_close (bs=0x555555d22560) at 
/mnt/code/us-qemu/block.c:2909
#8  bdrv_delete (bs=0x555555d22560) at /mnt/code/us-qemu/block.c:3100
#9  bdrv_unref (bs=0x555555d22560) at /mnt/code/us-qemu/block.c:4087
#10 0x00005555555baf44 in blk_remove_bs (address@hidden) at 
/mnt/code/us-qemu/block/block-backend.c:552
#11 0x00005555555bb173 in blk_delete (blk=0x555555d22380) at 
/mnt/code/us-qemu/block/block-backend.c:238
#12 blk_unref (address@hidden) at /mnt/code/us-qemu/block/block-backend.c:282
#13 0x000055555557a22c in img_convert (argc=<optimized out>, argv=<optimized 
out>) at /mnt/code/us-qemu/qemu-img.c:2359
#14 0x0000555555574189 in main (argc=5, argv=0x7fffffffe4a0) at 
/mnt/code/us-qemu/qemu-img.c:4464


Peter


/Anton


it seems that this is a bit tricky, can you share how your test case works?

thanks,
peter


how I tested today was basically

diff --git a/qemu-img.c b/qemu-img.c
index 4425aaa..3d2d506 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1788,6 +1788,10 @@ static void coroutine_fn convert_co_do_copy(void *opaque)

         if (status == BLK_DATA) {
             ret = convert_co_read(s, sector_num, n, buf);
+            const uint64_t fsector = 10*1024*1024/512;
+            if (sector_num <= fsector && fsector < sector_num+n) {
+                ret = -EIO;
+            }
             if (ret < 0) {
                 error_report("error while reading sector %" PRId64
                              ": %s", sector_num, strerror(-ret));




reply via email to

[Prev in Thread] Current Thread [Next in Thread]