qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: intermittent hang, s390x host, bios-tables-test test, TPM


From: Peter Maydell
Subject: Re: intermittent hang, s390x host, bios-tables-test test, TPM
Date: Fri, 6 Jan 2023 15:39:31 +0000

On Fri, 6 Jan 2023 at 15:16, Stefan Berger <stefanb@linux.ibm.com> wrote:
>
>
>
> On 1/6/23 07:10, Peter Maydell wrote:
> > I'm seeing an intermittent hang on the s390 CI runner in the
> > bios-tables-test test. It looks like we've deadlocked because:
> >
> >   * the TPM device is waiting for data on its socket that never arrives,
> >     and it's holding the iothread lock
> >   * QEMU is therefore not making forward progress;
> >     in particular it is unable to handle qtest queries/responses
> >   * the test binary thread 1 is waiting to get a response to its
> >     qtest command, which is not going to arrive
> >   * test binary thread 3 (tpm_emu_ctrl_thread) is has hit an
> >     assertion and is trying to kill QEMU via qtest_kill_qemu()
> >   * qtest_kill_qemu() is only a "SIGTERM and wait", so will wait
> >     forever, because QEMU won't respond to the SIGTERM while it's
> >     blocked waiting for the TPM device to release the iothread lock
> >   * because the ctrl-thread is waiting for QEMU to exit, it's never
> >     going to send the data that would unblock the TPM device emulation
> >
> [...]
>
> >
> > Thread 3 (Thread 0x3ff8dafe900 (LWP 2661316)):
> > #0  0x000003ff8e9c6002 in __GI___wait4 (pid=<optimized out>,
> > stat_loc=stat_loc@entry=0x2aa0b42c9bc, options=<optimized out>,
> > usage=usage@entry=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:27
> > #1  0x000003ff8e9c5f72 in __GI___waitpid (pid=<optimized out>,
> > stat_loc=stat_loc@entry=0x2aa0b42c9bc, options=options@entry=0) at
> > waitpid.c:38
> > #2  0x000002aa0952a516 in qtest_wait_qemu (s=0x2aa0b42c9b0) at
> > ../tests/qtest/libqtest.c:206
> > #3  0x000002aa0952a58a in qtest_kill_qemu (s=0x2aa0b42c9b0) at
> > ../tests/qtest/libqtest.c:229
> > #4  0x000003ff8f0c288e in g_hook_list_invoke () from
> > /lib/s390x-linux-gnu/libglib-2.0.so.0
> > #5  <signal handler called>
> > #6  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
> > #7  0x000003ff8e9240a2 in __GI_abort () at abort.c:79
> > #8  0x000003ff8f0feda8 in g_assertion_message () from
> > /lib/s390x-linux-gnu/libglib-2.0.so.0
> > #9  0x000003ff8f0fedfe in g_assertion_message_expr () from
> > /lib/s390x-linux-gnu/libglib-2.0.so.0
> > #10 0x000002aa09522904 in tpm_emu_ctrl_thread (data=0x3fff5ffa160) at
> > ../tests/qtest/tpm-emu.c:189
>
> This here seems to be the root cause. An unknown control channel command
> was received from the TPM emulator backend by the control channel thread
> and we end up in g_assert_not_reached().

Yeah. It would be good if we didn't deadlock without printing
the assertion, though...

I guess we could improve qtest_kill_qemu() so it doesn't wait
indefinitely for QEMU to exit but instead sends a SIGKILL 20
seconds after the SIGTERM. (Annoyingly, there is no convenient
"waitpid but with a timeout" function...)

thanks
-- PMM



reply via email to

[Prev in Thread] Current Thread [Next in Thread]