qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] tests: improve performance of device-introspect-test


From: Markus Armbruster
Subject: Re: [PATCH] tests: improve performance of device-introspect-test
Date: Tue, 14 Jul 2020 09:57:28 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)

Daniel P. Berrangé <berrange@redhat.com> writes:

> On Fri, Jul 10, 2020 at 10:03:56PM +0200, Markus Armbruster wrote:
>> Daniel P. Berrangé <berrange@redhat.com> writes:
>> 
>> > Total execution time with "-m slow" and x86_64 QEMU, drops from 3
>> > minutes 15 seconds, down to 54 seconds.
>> >
>> > Individual tests drop from 17-20 seconds, down to 3-4 seconds.
>> 
>> Nice!
>> 
>> A few observations on this test (impatient readers may skip to
>> "Conclusions"):
>
> snip
>
>> * The number of known device types varies between targets from 33
>>   (tricore) to several hundreds (x86_64+i386: 421, ppc 593, arm 667,
>>   aarch64 680, ppc64 689).  Median is 215, sum is 7485.
>
> snip
>
>> * The test matrix is *expensive*.  Testing even a simple QMP query is
>>   when you do it a quarter million times.  ARM is the greediest pig by
>>   far (170k introspections, almost two thirds of the total!), followed
>>   by ppc (36k), x86 (12k) and mips (11k).  Ideas on trimming excess are
>>   welcome.  I'm not even sure anymore this should be a qtest.
>
> We have 70 arm machines, 667 devices. IIUC we are roughly testing every
> device against everything machine. 46,690 tests.
>
> Most of the time devices are going to behave identically regardless of
> which machine type is used. The trouble is some machines are different
> enough that they can genuinely trigger different behaviour. It isn't
> possible to slim the (machine, device) expansion down programatically
> while still exercising the interesting combinations unless we get alot
> more advanced.
>
> eg if a have a PCI device, we only need test it in one PCI based machine,
> and only need test it on one non-PCI based machine.

The trouble is .instance_init() can do anything, and can therefore
interact badly with anything.

Example: m2sxxx_soc_initfn() of device type "msf2-soc" messes with
nd_table[0].  That's wrong.  The test doesn't catch it with machine type
"none", where nd_table[0] is blank.  It does catch it with machine type
"ast2600-evb", because aspeed_machine_init() puts something incompatible
into nd_table[0], which makes m2sxxx_soc_initfn() crash.

"msf2-soc" is not a PCI device, but if it was, then the two machines
(with and without PCI) picked for testing PCI devices may well both
leave nd_table[0] blank, and therefore not catch the bug.

Some instances of "device code does stuff it should not" could be
prevented by making interfaces inaccessible there.  We'd have to
identify device code first.  The hw/BASE-ARCH/ contain both boards and
devices.  Possibly even in the same .c.

> I would be interesting to actually get some CPU profiling data for
> this test to see if it points out anything interesting about where the
> time is being spent. Even if we don't reduce the complexity, reducing
> a time factor will potentially greatly help. 

Hunch: when we want to test device instantiation and finalization for a
million (give or take) combinations of board x device, testing them one
by one with QMP might be a bad idea.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]