[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [Qemu-block] [PULL 00/42] Block patches
From: |
Kevin Wolf |
Subject: |
Re: [Qemu-devel] [Qemu-block] [PULL 00/42] Block patches |
Date: |
Mon, 1 Oct 2018 18:09:37 +0200 |
User-agent: |
Mutt/1.9.1 (2017-09-22) |
Am 01.10.2018 um 16:14 hat Kevin Wolf geschrieben:
> Am 01.10.2018 um 15:03 hat Peter Maydell geschrieben:
> > On 28 September 2018 at 15:36, Peter Maydell <address@hidden> wrote:
> > > I'm finding that test-bdrv-drain hangs intermittently on my OSX host.
> >
> > Ping? Between this and test-replication I'm finding that my
> > parallel build tests for merges are failing about 50% of the
> > time :-(
>
> Sorry, there wasn't much more than a weekend between your report and
> now.
>
> For the replication one, I think we can just take the AioContext lock in
> the test case while we decide how the API should really be used. I'll
> prepare a fix for that (and hopefully I'll be able to reproduce the
> problem reliably enough to verify the fix).
>
> Max said he could reproduce some hang in test-bdrv-drain (though we
> don't know if this has anything to do with your OS X hang, which looked
> rather odd) and would look into it, but I don't think we know the
> problem yet. I'll try to reproduce that one after fixing the replication
> test.
So I sent two patches for the two test cases that should fix the bugs
that made the tests fail relatively frequently. I can still reproduce
another hang, which is a bit mysterious to me:
Thread 2 (Thread 3321.3818):
#0 0x00007f2ebbdcc4e9 in syscall () from /lib64/libc.so.6
#1 0x00005594d095690b in qemu_futex_wait (val=<optimized out>, f=<optimized
out>) at /home/kwolf/source/qemu/include/qemu/futex.h:29
#2 qemu_event_wait (address@hidden <rcu_call_ready_event>) at
util/qemu-thread-posix.c:442
#3 0x00005594d0965f58 in call_rcu_thread (opaque=<optimized out>) at
util/rcu.c:261
#4 0x00007f2ebc09d36d in start_thread () from /lib64/libpthread.so.0
#5 0x00007f2ebbdd1b4f in clone () from /lib64/libc.so.6
Thread 1 (Thread 3321.3321):
#0 0x00007f2ebc09e89d in pthread_join () from /lib64/libpthread.so.0
#1 0x00005594d0956b6f in qemu_thread_join (address@hidden) at
util/qemu-thread-posix.c:565
#2 0x00005594d091f4d9 in iothread_join (iothread=0x5594d16bd0b0) at
tests/iothread.c:62
#3 0x00005594d08806cc in test_iothread_common (drain_type=BDRV_DRAIN_ALL,
drain_thread=<optimized out>) at tests/test-bdrv-drain.c:763
#4 0x00007f2ebd58e178 in g_test_run_suite_internal () from
/lib64/libglib-2.0.so.0
#5 0x00007f2ebd58e37b in g_test_run_suite_internal () from
/lib64/libglib-2.0.so.0
#6 0x00007f2ebd58e37b in g_test_run_suite_internal () from
/lib64/libglib-2.0.so.0
#7 0x00007f2ebd58e51b in g_test_run_suite () from /lib64/libglib-2.0.so.0
#8 0x00007f2ebd58e571 in g_test_run () from /lib64/libglib-2.0.so.0
#9 0x00005594d087a534 in main (argc=<optimized out>, argv=<optimized out>) at
tests/test-bdrv-drain.c:1606
This pthread_join() is waiting for a thread that doesn't even exist any
more. I caught the bug in rr and am clearly seeing how the iothread is
notified and terminates. But pthread_join() just doesn't return.
Kevin