qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: intermittent hang, s390x host, bios-tables-test test, TPM


From: Stefan Berger
Subject: Re: intermittent hang, s390x host, bios-tables-test test, TPM
Date: Fri, 6 Jan 2023 10:58:38 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.6.0



On 1/6/23 10:39, Peter Maydell wrote:
On Fri, 6 Jan 2023 at 15:16, Stefan Berger <stefanb@linux.ibm.com> wrote:



On 1/6/23 07:10, Peter Maydell wrote:
I'm seeing an intermittent hang on the s390 CI runner in the
bios-tables-test test. It looks like we've deadlocked because:



Thread 3 (Thread 0x3ff8dafe900 (LWP 2661316)):
#0  0x000003ff8e9c6002 in __GI___wait4 (pid=<optimized out>,
stat_loc=stat_loc@entry=0x2aa0b42c9bc, options=<optimized out>,
usage=usage@entry=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:27
#1  0x000003ff8e9c5f72 in __GI___waitpid (pid=<optimized out>,
stat_loc=stat_loc@entry=0x2aa0b42c9bc, options=options@entry=0) at
waitpid.c:38
#2  0x000002aa0952a516 in qtest_wait_qemu (s=0x2aa0b42c9b0) at
../tests/qtest/libqtest.c:206
#3  0x000002aa0952a58a in qtest_kill_qemu (s=0x2aa0b42c9b0) at
../tests/qtest/libqtest.c:229
#4  0x000003ff8f0c288e in g_hook_list_invoke () from
/lib/s390x-linux-gnu/libglib-2.0.so.0
#5  <signal handler called>
#6  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#7  0x000003ff8e9240a2 in __GI_abort () at abort.c:79
#8  0x000003ff8f0feda8 in g_assertion_message () from
/lib/s390x-linux-gnu/libglib-2.0.so.0
#9  0x000003ff8f0fedfe in g_assertion_message_expr () from
/lib/s390x-linux-gnu/libglib-2.0.so.0
#10 0x000002aa09522904 in tpm_emu_ctrl_thread (data=0x3fff5ffa160) at
../tests/qtest/tpm-emu.c:189

This here seems to be the root cause. An unknown control channel command
was received from the TPM emulator backend by the control channel thread
and we end up in g_assert_not_reached().

Yeah. It would be good if we didn't deadlock without printing
the assertion, though...

I guess we could improve qtest_kill_qemu() so it doesn't wait
indefinitely for QEMU to exit but instead sends a SIGKILL 20
seconds after the SIGTERM. (Annoyingly, there is no convenient
"waitpid but with a timeout" function...)

Yes, wait5(&to,...) doesn't exist, yet. I guess one would have to use a loop 
sending signal 0 to the pid for 20 seconds?

Though I'd really like to know where that data race is coming from and why we 
get an unknown command. I am now running this on a ppc64 and x86_64 host over 
the weekend to see what happens. All good so far.

   Stefan

thanks
-- PMM



reply via email to

[Prev in Thread] Current Thread [Next in Thread]