qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: intermittent hang, s390x host, bios-tables-test test, TPM


From: Stefan Berger
Subject: Re: intermittent hang, s390x host, bios-tables-test test, TPM
Date: Tue, 10 Jan 2023 17:02:58 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.6.0



On 1/10/23 14:47, Stefan Berger wrote:


On 1/10/23 14:27, Daniel P. Berrangé wrote:
On Tue, Jan 10, 2023 at 01:50:26PM -0500, Stefan Berger wrote:


On 1/6/23 10:16, Stefan Berger wrote:
This here seems to be the root cause. An unknown control channel
command was received from the TPM emulator backend by the control channel 
thread and we end up in g_assert_not_reached().

https://github.com/qemu/qemu/blob/master/tests/qtest/tpm-emu.c#L189



          ret = qio_channel_read(ioc, (char *)&cmd, sizeof(cmd), NULL);
          if (ret <= 0) {
              break;
          }

          cmd = be32_to_cpu(cmd);
          switch (cmd) {
   [...]
          default:
              g_debug("unimplemented %u", cmd);
              g_assert_not_reached();                <------------------
          }

I will run this test case in an endless loop on an x86_64 host and see what we 
get there ...

I could not recreate the issue running the  test on a ppc64 and x86_64
host. There we like >100k test runs on ppc64 and >40k on x86_64. Also
simulating the reception of an unsupported command did not lead to a
hang like shown here.

Assuming your ppc64 host is running an little endian OS, and
we're only seeing the test failure on s390x, then it points towards
the problem being an endianness issue in the TPM code. Something
missing a byteswap somewhere along the way ?

Yes, my ppc64 machine is also little endian. If the issue  was not an 
intermittent but a permanent
failure I would look for something like that. I would think it's more some sort 
of initialization
issue, like a value on the stack that occasionally set to an undesirable value 
-- maybe even in a
dependency.

I found I still had access to an s390x machine. ~2700 loops on this test case
so far but nothing... it would be good to be able to recreate the issue and
apply the fix but we'll have to do it without testing then I guess.

Does this look about right? From my tests with injecting an error it at least
seems to do what it is intended to do.

diff --git a/tests/qtest/tpm-emu.c b/tests/qtest/tpm-emu.c
index 2994d1cf42..dbc308a572 100644
--- a/tests/qtest/tpm-emu.c
+++ b/tests/qtest/tpm-emu.c
@@ -36,11 +36,19 @@ void tpm_emu_test_wait_cond(TPMTestState *s)
     g_mutex_unlock(&s->data_mutex);
 }

+static void tpm_emu_close_data_ioc(void *ioc)
+{
+    g_debug("CLOSE DATA IOC");
+    qio_channel_close(ioc, NULL);
+}
+
 static void *tpm_emu_tpm_thread(void *data)
 {
     TPMTestState *s = data;
     QIOChannel *ioc = s->tpm_ioc;

+    qtest_add_abrt_handler(tpm_emu_close_data_ioc, ioc);
+
     s->tpm_msg = g_new(struct tpm_hdr, 1);
     while (true) {
         int minhlen = sizeof(s->tpm_msg->tag) + sizeof(s->tpm_msg->len);
@@ -77,12 +85,19 @@ static void *tpm_emu_tpm_thread(void *data)
                           &error_abort);
     }

+    qtest_remove_abrt_handler(ioc);
     g_free(s->tpm_msg);
     s->tpm_msg = NULL;
     object_unref(OBJECT(s->tpm_ioc));
     return NULL;
 }

+static void tpm_emu_close_ctrl_ioc(void *ioc)
+{
+    g_debug("CLOSE CTRL IOC");
+    qio_channel_close(ioc, NULL);
+}
+
 void *tpm_emu_ctrl_thread(void *data)
 {
     TPMTestState *s = data;
@@ -119,6 +134,8 @@ void *tpm_emu_ctrl_thread(void *data)
         s->emu_tpm_thread = g_thread_new(NULL, tpm_emu_tpm_thread, s);
     }

+    qtest_add_abrt_handler(tpm_emu_close_ctrl_ioc, ioc);
+
     while (true) {
         uint32_t cmd;
         ssize_t ret;
@@ -129,6 +146,9 @@ void *tpm_emu_ctrl_thread(void *data)
         }

         cmd = be32_to_cpu(cmd);
+        //g_debug("cmd=%u", cmd);
+        //if (cmd == 14)
+        //    cmd = 100;
         switch (cmd) {
         case CMD_GET_CAPABILITY: {
             ptm_cap cap = cpu_to_be64(0x3fff);
@@ -190,6 +210,8 @@ void *tpm_emu_ctrl_thread(void *data)
         }
     }

+    qtest_remove_abrt_handler(ioc);
+
     object_unref(OBJECT(ioc));
     object_unref(OBJECT(lioc));
     return NULL;


    Stefan



With regards,
Daniel




reply via email to

[Prev in Thread] Current Thread [Next in Thread]