Re: [PATCH v3 0/7] Switch iotests to using Async QMP

On Wed, Oct 13, 2021 at 8:51 AM John Snow <jsnow@redhat.com> wrote:

On Wed, Oct 13, 2021 at 4:45 AM Hanna Reitz <hreitz@redhat.com> wrote:
On 13.10.21 00:34, John Snow wrote:
> Based-on: <20211012214152.802483-1-jsnow@redhat.com" target="_blank">20211012214152.802483-1-jsnow@redhat.com>
> [PULL 00/10] Python patches
> GitLab: https://gitlab.com/jsnow/qemu/-/commits/python-aqmp-iotest-wrapper
> CI: https://gitlab.com/jsnow/qemu/-/pipelines/387210591
>
> Hiya,
>
> This series continues where the last two AQMP series left off and adds a
> synchronous 'legacy' wrapper around the new AQMP interface, then drops
> it straight into iotests to prove that AQMP is functional and totally
> cool and fine. The disruption and churn to iotests is pretty minimal.
>
> In the event that a regression happens and I am not physically proximate
> to inflict damage upon, one may set the QEMU_PYTHON_LEGACY_QMP variable
> to any non-empty string as it pleases you to engage the QMP machinery
> you are used to.
>
> I'd like to try and get this committed early in the 6.2 development
> cycle to give ample time to smooth over any possible regressions. I've
> tested it locally and via gitlab CI, across Python versions 3.6 through
> 3.10, and "worksforme". If something bad happens, we can revert the
> actual switch-flip very trivially.

So running iotests locally, I got one failure:

$ TEST_DIR=/tmp/vdi-tests ./check -c writethrough -vdi 300
[...]
300                             fail       [10:28:06] [10:28:11]
5.1s                 output mismatch (see 300.out.bad)
--- /home/maxx/projects/qemu/tests/qemu-iotests/300.out
+++ 300.out.bad
@@ -1,4 +1,5 @@
-.......................................
+..............ERROR:qemu.aqmp.qmp_client.qemu-b-222963:Task.Reader:
ConnectionResetError: [Errno 104] Connection reset by peer
+.........................
  ----------------------------------------------------------------------
  Ran 39 tests
[...]

Oh, unfortunate.

I’m afraid I can’t really give a reproducer or anything. It feels like

Thank you for the report!

just some random spurious timing-related error. Although then again,
300 does have an `except machine.AbnormalShutdown` clause at one
point... So perhaps that’s the culprit, and we need to disable logging
there.

I'll investigate!

Unfortunately, even in a loop some 150 times I couldn't reproduce this one. As you point out, it appears to be just a failure caused by logging. The test logic itself completes as expected.

Still, I would expect, on a "clean" shutdown of the destination host (where the destination process fails to load the migration stream and voluntarily exits with an error code) to end with a FIN/ACK for TCP or ... uh, whatever happens for a UNIX socket. Where's the Connection Reset coming from? Did the destination VM process *crash*?

I'm not so sure that I *should* silence this error, but I also can't reproduce it at all to answer these questions, so uh. uhhh. I guess I will just hammer it on a loop a few hundred times more and see if I get lucky.

From:	John Snow
Subject:	Re: [PATCH v3 0/7] Switch iotests to using Async QMP
Date:	Wed, 13 Oct 2021 10:00:49 -0400