qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] cirrus: don't run full qtest on macOS


From: Markus Armbruster
Subject: Re: [PATCH] cirrus: don't run full qtest on macOS
Date: Fri, 15 Jan 2021 14:10:20 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)

Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> writes:

> On 06/01/2021 13:36, Daniel P. Berrangé wrote:
>
>>> The Cirrus CI macOS build hosts have exhibited a serious performance
>>> degradation in recent months. For example the "qom-test" qtest takes
>>> over an hour for only the qemu-system-aarch64 binary. This is as much
>>> 20-40 times slower than other environments. The other qtests all show
>>> similar performance degradation, as do many of the unit tests.
>>>
>>> This does not appear related to QEMU code changes, since older git
>>> commits which were known to fully complete in less than 1 hour on
>>> Cirrus CI now also show similar bad performance. Either Cirrus CI
>>> performance has degraded, or an change in its environment has exposed
>>> a latent bug widely affecting all of QEMU. Debugging the qom-test
>>> showed no easily identified large bottleneck - every step of the
>>> test execution was simply slower.
>> It appears I might be mistaken here. On IRC it was reported that
>> going back furrther to v5.1.0 shows good performance in Cirrus
>> still.
>> I had only gone back as far as
>> 2a5a79d1b57280edd72193f6031de3feb682154e
>> which I thought was fast originally.
>> So somewhere between v5.1.0 and 2a5a79 we apparently regressed.
>
> I tested a few macos cirrus-ci builds after the meson conversion and
> found that they were working fine, so whatever is affecting the macos
> build must be related to a QEMU change.
>
> A full bisect proved to be too tricky due to the instability of the
> tree at that point in time, however reading through "git log" and
> attempting some builds at merges I thought might be related I
> discovered that the slowness was introduced by this PR:
>
>
> commit b7092cda1b36ce687e65ab1831346f9529b781b8
> Merge: 497d415d76 eb94b81a94
> Author: Peter Maydell <peter.maydell@linaro.org>
> Date:   Fri Oct 9 13:20:46 2020 +0100
>
>     Merge remote-tracking branch
>     'remotes/armbru/tags/pull-monitor-2020-10-09' into staging
>
>     Monitor patches for 2020-10-09
[...]
>
> Fortunately that PR could be bisected and that led me this commit:
>
>
> 9ce44e2ce267caf5559904a201aa1986b0a8326b is the first bad commit
> commit 9ce44e2ce267caf5559904a201aa1986b0a8326b
> Author: Kevin Wolf <kwolf@redhat.com>
> Date:   Mon Oct 5 17:58:50 2020 +0200
>
>     qmp: Move dispatcher to a coroutine
>
>     This moves the QMP dispatcher to a coroutine and runs all QMP command
>     handlers that declare 'coroutine': true in coroutine context so they
>     can avoid blocking the main loop while doing I/O or waiting for other
>     events.
>
>     For commands that are not declared safe to run in a coroutine, the
>     dispatcher drops out of coroutine context by calling the QMP command
>     handler from a bottom half.
>
>     Signed-off-by: Kevin Wolf <kwolf@redhat.com>
>     Reviewed-by: Markus Armbruster <armbru@redhat.com>
>     Message-Id: <20201005155855.256490-10-kwolf@redhat.com>
>     Reviewed-by: Markus Armbruster <armbru@redhat.com>
>     Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
>     Signed-off-by: Markus Armbruster <armbru@redhat.com>
>
>
> Given that Peter can run the tests manually, I'm not exactly sure why
> the cirrus-ci environment is different - all I can think of is that it
> could be related to running in a headless terminal.
>
> For reference running cirrus-ci on the last good commit 04f22362f1
> "qapi: Add a 'coroutine' flag for commands" gave me a total runtime of
> 35 mins.

Let me explain what the patch does, in the hope of helping somebody else
find out why it behaves badly on MacOS.  Unfortunately, a proper
explanation requires quite some context.  Bear with me.

In the beginning, there was just one monitor, and it ran entirely in the
main loop (callback when input is available).  To keep the main loop
going, monitor commands better complete quickly.

Then we got multiple monitors.  Same story, just multiple input streams,
each with a callback.

We also got additional threads.  When I say "main loop", I mean the main
thread's main loop.

"Must complete quickly" means no blocking I/O and such.  Writing code
that way is somewhere between hard and impractical.  Much code called by
monitor commands wasn't written that way.

When a monitor command blocks, the main loop blocks, and that means no
more monitor commands can run, not even on other monitors.

"Doctor, doctor, running code in the main loop hurts".  Sadly, the
doctor's recommended remedy "don't do that then" is really hard to
apply: a lot of code has been written assuming "running in the main
loop, with the big QEMU lock held".

The first small step towards it was taken to enable the "out-of-band"
feature.  We moved the QMP monitor proper out of the main loop into a
monitor I/O thread.  The monitor commands get funneled to the main loop.
Instead of the main loop calling the monitor when a file descriptor has
input, it now calls the command dispatcher when a funnel queue has a
command.

Why bother?  Because now we can thread execute special "out-of-band"
commands right away, in the I/O thread, regardless of how badly the main
loop is.  Peter Xu wanted this badly enough for postcopy recovery to
code it up.  It was hard.  It's not generally useful, as the restriction
on what OOB commands can do are severe.

The next step was the coroutine feature.  Quite a few of the problematic
monitor commands are actually running coroutine-capable code: when
running in coroutine context, the code yields instead of blocking.
Running such commands in monitor context improves things from "blocks
the main loop" to "blocks all monitor commands".

Sadly, code exists that falls apart in coroutine context.  So we had to
make running in coroutine context opt-in.  Right now only two commands
opt in: block_resize and screendump.  Hopefully, we can get to the point
where most or all do.

Until all do, the dispatcher needs to run some commands coroutine
context, and others outside coroutine context.  How?

We've finally reached commit 9ce44e2ce2 "qmp: Move dispatcher to a
coroutine".

The QMP command dispatcher runs in a coroutine in the main loop (HMP
works differently, but let's ignore it here).

If the command can be executed in coroutine context, the dispatcher
calls its handler as before.  Right now, we take this path just for
block_resize and screendump.

Else, we create a bottom half that calls the handler, and schedule it to
run in the main loop.  Right now, we take this path for all the other
commands.

Hypothesis: on MacOS, taking this path is s-l-o-w.  Perhaps creating
bottom halves takes ages.  Perhaps the delay until a scheduled bottom
half actually runs is huge.

Possibly useful experiment: find out which QMP commands the slow test
case runs a lot, and mark them 'coroutine': true in the QAPI schema.

Another hypothesis: entering and leaving the QMP dispatcher coroutine is
slow.  If yes, all coroutines are probably slow, which hurts a lot more
than just QMP.

Hope this helps.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]