[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: intermittent hang, s390x host, bios-tables-test test, TPM
From: |
Daniel P . Berrangé |
Subject: |
Re: intermittent hang, s390x host, bios-tables-test test, TPM |
Date: |
Wed, 11 Jan 2023 09:05:44 +0000 |
User-agent: |
Mutt/2.2.9 (2022-11-12) |
On Tue, Jan 10, 2023 at 05:02:58PM -0500, Stefan Berger wrote:
>
>
> On 1/10/23 14:47, Stefan Berger wrote:
> >
> >
> > On 1/10/23 14:27, Daniel P. Berrangé wrote:
> > > On Tue, Jan 10, 2023 at 01:50:26PM -0500, Stefan Berger wrote:
> > > >
> > > >
> > > > On 1/6/23 10:16, Stefan Berger wrote:
> > > > > This here seems to be the root cause. An unknown control channel
> > > > > command was received from the TPM emulator backend by the control
> > > > > channel thread and we end up in g_assert_not_reached().
> > > > >
> > > > > https://github.com/qemu/qemu/blob/master/tests/qtest/tpm-emu.c#L189
> > > > >
> > > > >
> > > > >
> > > > > ret = qio_channel_read(ioc, (char *)&cmd, sizeof(cmd),
> > > > > NULL);
> > > > > if (ret <= 0) {
> > > > > break;
> > > > > }
> > > > >
> > > > > cmd = be32_to_cpu(cmd);
> > > > > switch (cmd) {
> > > > > [...]
> > > > > default:
> > > > > g_debug("unimplemented %u", cmd);
> > > > > g_assert_not_reached();
> > > > > <------------------
> > > > > }
> > > > >
> > > > > I will run this test case in an endless loop on an x86_64 host and
> > > > > see what we get there ...
> > > >
> > > > I could not recreate the issue running the test on a ppc64 and x86_64
> > > > host. There we like >100k test runs on ppc64 and >40k on x86_64. Also
> > > > simulating the reception of an unsupported command did not lead to a
> > > > hang like shown here.
> > >
> > > Assuming your ppc64 host is running an little endian OS, and
> > > we're only seeing the test failure on s390x, then it points towards
> > > the problem being an endianness issue in the TPM code. Something
> > > missing a byteswap somewhere along the way ?
> >
> > Yes, my ppc64 machine is also little endian. If the issue was not an
> > intermittent but a permanent
> > failure I would look for something like that. I would think it's more some
> > sort of initialization
> > issue, like a value on the stack that occasionally set to an undesirable
> > value -- maybe even in a
> > dependency.
>
> I found I still had access to an s390x machine. ~2700 loops on this test case
> so far but nothing... it would be good to be able to recreate the issue and
> apply the fix but we'll have to do it without testing then I guess.
>
> Does this look about right? From my tests with injecting an error it at least
> seems to do what it is intended to do.
>
> diff --git a/tests/qtest/tpm-emu.c b/tests/qtest/tpm-emu.c
> index 2994d1cf42..dbc308a572 100644
> --- a/tests/qtest/tpm-emu.c
> +++ b/tests/qtest/tpm-emu.c
> @@ -36,11 +36,19 @@ void tpm_emu_test_wait_cond(TPMTestState *s)
> g_mutex_unlock(&s->data_mutex);
> }
>
> +static void tpm_emu_close_data_ioc(void *ioc)
> +{
> + g_debug("CLOSE DATA IOC");
> + qio_channel_close(ioc, NULL);
> +}
> +
> static void *tpm_emu_tpm_thread(void *data)
> {
> TPMTestState *s = data;
> QIOChannel *ioc = s->tpm_ioc;
>
> + qtest_add_abrt_handler(tpm_emu_close_data_ioc, ioc);
> +
> s->tpm_msg = g_new(struct tpm_hdr, 1);
> while (true) {
> int minhlen = sizeof(s->tpm_msg->tag) + sizeof(s->tpm_msg->len);
> @@ -77,12 +85,19 @@ static void *tpm_emu_tpm_thread(void *data)
> &error_abort);
> }
>
> + qtest_remove_abrt_handler(ioc);
> g_free(s->tpm_msg);
> s->tpm_msg = NULL;
> object_unref(OBJECT(s->tpm_ioc));
> return NULL;
> }
>
> +static void tpm_emu_close_ctrl_ioc(void *ioc)
> +{
> + g_debug("CLOSE CTRL IOC");
> + qio_channel_close(ioc, NULL);
> +}
> +
> void *tpm_emu_ctrl_thread(void *data)
> {
> TPMTestState *s = data;
> @@ -119,6 +134,8 @@ void *tpm_emu_ctrl_thread(void *data)
> s->emu_tpm_thread = g_thread_new(NULL, tpm_emu_tpm_thread, s);
> }
>
> + qtest_add_abrt_handler(tpm_emu_close_ctrl_ioc, ioc);
I'd suggest this be done before starting tpm_emu_tpm_thread,
immediately after the "ioc" is created.
> +
> while (true) {
> uint32_t cmd;
> ssize_t ret;
> @@ -129,6 +146,9 @@ void *tpm_emu_ctrl_thread(void *data)
> }
>
> cmd = be32_to_cpu(cmd);
> + //g_debug("cmd=%u", cmd);
> + //if (cmd == 14)
> + // cmd = 100;
> switch (cmd) {
> case CMD_GET_CAPABILITY: {
> ptm_cap cap = cpu_to_be64(0x3fff);
> @@ -190,6 +210,8 @@ void *tpm_emu_ctrl_thread(void *data)
> }
> }
>
> + qtest_remove_abrt_handler(ioc);
> +
> object_unref(OBJECT(ioc));
> object_unref(OBJECT(lioc));
> return NULL;
>
> >
> > Stefan
> >
> > >
> > >
> > > With regards,
> > > Daniel
> >
>
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
- Re: intermittent hang, s390x host, bios-tables-test test, TPM, (continued)
- Re: intermittent hang, s390x host, bios-tables-test test, TPM, Stefan Berger, 2023/01/06
- Re: intermittent hang, s390x host, bios-tables-test test, TPM, Peter Maydell, 2023/01/06
- Re: intermittent hang, s390x host, bios-tables-test test, TPM, Stefan Berger, 2023/01/06
- Re: intermittent hang, s390x host, bios-tables-test test, TPM, Daniel P . Berrangé, 2023/01/10
- Re: intermittent hang, s390x host, bios-tables-test test, TPM, Peter Maydell, 2023/01/10
- Re: intermittent hang, s390x host, bios-tables-test test, TPM, Stefan Berger, 2023/01/10
- Re: intermittent hang, s390x host, bios-tables-test test, TPM, Daniel P . Berrangé, 2023/01/10
- Re: intermittent hang, s390x host, bios-tables-test test, TPM, Stefan Berger, 2023/01/10
- Re: intermittent hang, s390x host, bios-tables-test test, TPM, Stefan Berger, 2023/01/10
- Re: intermittent hang, s390x host, bios-tables-test test, TPM,
Daniel P . Berrangé <=
- Re: intermittent hang, s390x host, bios-tables-test test, TPM, Stefan Berger, 2023/01/11
Re: intermittent hang, s390x host, bios-tables-test test, TPM, Daniel P . Berrangé, 2023/01/10