qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Debugging io deadlock


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] Debugging io deadlock
Date: Tue, 5 Dec 2017 13:54:47 +0000
User-agent: Mutt/1.9.1 (2017-09-22)

On Mon, Dec 04, 2017 at 08:22:48PM +0100, BALATON Zoltan wrote:
> I'm seeing a possible deadlock that I don't know how to debug. Any hint on
> how to find the cause or what should be checked further to identify the
> reason why this is happening and how to fix it is greatly appreciated.
> 
> Here are the state of threads:
> 
> (gdb) info thr
>   Id   Target Id         Frame
> * 4    Thread 0x7fffba76c700 (LWP 3445) "qemu-system-ppc" 0x0000555555cbec04 
> in worker_thread (opaque=0x7fffe40c9000) at util/thread-pool.c:92
>   3    Thread 0x7fffe8829700 (LWP 3443) "qemu-system-ppc" 0x00007ffff78d267f 
> in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
>   2    Thread 0x7ffff111b700 (LWP 3442) "qemu-system-ppc" 0x00007ffff42cad29 
> in syscall () from /lib64/libc.so.6
>   1    Thread 0x7ffff7fc7b00 (LWP 3441) "qemu-system-ppc" 0x00007ffff42c4e31 
> in ppoll () from /lib64/libc.so.6
> (gdb) bt
> #0  0x00007ffff78d4830 in sem_timedwait () from /lib64/libpthread.so.0
> #1  0x0000555555cc572e in qemu_sem_timedwait (sem=0x7fffe40c9078, ms=10000) 
> at util/qemu-thread-posix.c:289
> #2  0x0000555555cbec04 in worker_thread (opaque=0x7fffe40c9000) at 
> util/thread-pool.c:92
> #3  0x00007ffff78cd5bd in start_thread () from /lib64/libpthread.so.0
> #4  0x00007ffff42d062d in clone () from /lib64/libc.so.6
> (gdb) thr 3
> [Switching to thread 3 (Thread 0x7fffe8829700 (LWP 3443))]
> #0  0x00007ffff78d267f in pthread_cond_wait@@GLIBC_2.3.2 () from 
> /lib64/libpthread.so.0
> (gdb) bt
> #0  0x00007ffff78d267f in pthread_cond_wait@@GLIBC_2.3.2 () from 
> /lib64/libpthread.so.0
> #1  0x0000555555cc5458 in qemu_cond_wait (cond=0x555556b47b90, 
> mutex=0x5555565b5220 <qemu_global_mutex>) at util/qemu-thread-posix.c:161
> #2  0x00005555557e6690 in qemu_tcg_wait_io_event (cpu=0x7ffff7e20010) at 
> cpus.c:1084
> #3  0x00005555557e6f00 in qemu_tcg_rr_cpu_thread_fn (arg=0x7ffff7e20010) at 
> cpus.c:1396
> #4  0x00007ffff78cd5bd in start_thread () from /lib64/libpthread.so.0
> #5  0x00007ffff42d062d in clone () from /lib64/libc.so.6
> (gdb) thr 2
> [Switching to thread 2 (Thread 0x7ffff111b700 (LWP 3442))]
> #0  0x00007ffff42cad29 in syscall () from /lib64/libc.so.6
> (gdb) bt
> #0  0x00007ffff42cad29 in syscall () from /lib64/libc.so.6
> #1  0x0000555555cc58a7 in qemu_futex_wait (f=0x555556a01134 
> <rcu_call_ready_event>, val=4294967295) at include/qemu/futex.h:29
> #2  0x0000555555cc5a74 in qemu_event_wait (ev=0x555556a01134 
> <rcu_call_ready_event>) at util/qemu-thread-posix.c:442
> #3  0x0000555555cdd92c in call_rcu_thread (opaque=0x0) at util/rcu.c:249
> #4  0x00007ffff78cd5bd in start_thread () from /lib64/libpthread.so.0
> #5  0x00007ffff42d062d in clone () from /lib64/libc.so.6
> (gdb) thr 1
> [Switching to thread 1 (Thread 0x7ffff7fc7b00 (LWP 3441))]
> #0  0x00007ffff42c4e31 in ppoll () from /lib64/libc.so.6
> (gdb) bt
> #0  0x00007ffff42c4e31 in ppoll () from /lib64/libc.so.6
> #1  0x0000555555cbfe86 in qemu_poll_ns (fds=0x555557c17620, nfds=5, 
> timeout=29806320) at util/qemu-timer.c:334
> #2  0x0000555555cc0eab in os_host_main_loop_wait (timeout=29806320) at 
> util/main-loop.c:255
> #3  0x0000555555cc0f7d in main_loop_wait (nonblocking=0) at 
> util/main-loop.c:515
> #4  0x000055555599e2b3 in main_loop () at vl.c:1995
> #5  0x00005555559a6353 in main (argc=21, argv=0x7fffffffdef8, 
> envp=0x7fffffffdfa8) at vl.c:4911
> 
> Then if I wait a little, thread 4 exits due to sem_timedwait returning -1
> with errno=ETIMEDOUT leaving other threads waiting for something to happen
> but this is apparently a deadlock as it will be stuck here (thread 1-3 are
> still as above). Any idea why this could happen and how to debug it furhter?

Are you using the latest qemu.git/master?

Commit ef6dada8b44e1e7c4bec5c1115903af9af415b50 ("util/async: use
atomic_mb_set in qemu_bh_cancel") fixes hangs that occur with the thread
pool (Thread 4 in your example).  I'm not sure if this applies to your
hang though...

It looks like Thread 3 isn't running guest code because the cpu wants to
sleep (is it halted?).

Stefan

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]