qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [Qemu-ppc] [RFC PATCH] tests: Add a device_add/del HMP


From: Markus Armbruster
Subject: Re: [Qemu-devel] [Qemu-ppc] [RFC PATCH] tests: Add a device_add/del HMP test
Date: Wed, 06 Sep 2017 08:59:32 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/25.2 (gnu/linux)

Thomas Huth <address@hidden> writes:

> On 05.09.2017 18:48, Dr. David Alan Gilbert wrote:
>> * Markus Armbruster (address@hidden) wrote:
>>> Thomas Huth <address@hidden> writes:
>>>
>>>> People tend to forget to mark internal devices with "user_creatable = false
>>>> or hotpluggable = false, and these devices can crash QEMU if added via the
>>>> HMP monitor. So let's add a test to run through all devices and that tries
>>>> to add them blindly (without arguments) to see whether this could crash the
>>>> QEMU instance.
>>>>
>>>> Signed-off-by: Thomas Huth <address@hidden>
>>>> ---
>>>>  I've marked the patch as RFC since not all of the required device bug
>>>>  fixes have been merged yet (so this patch can not be included yet without
>>>>  breaking "make check"). It's also sad that "macio-oldworld" currently
>>>>  has to be blacklisted - I tried to find a fix for that device,  but I was
>>>>  not able to spot the exact problem so far. So help for fixing that device
>>>>  is very very welcome! The crash can be reproduced like this:
>>>>
>>>>  $ qemu-system-ppc64 -nographic -S -monitor stdio -serial none
>>>>  QEMU 2.10.50 monitor - type 'help' for more information
>>>>  (qemu) device_add macio-oldworld,id=x
>>>>  (qemu) device_del x
>>>>  (qemu) **
>>>>  ERROR:qom/object.c:1611:object_get_canonical_path_component:
>>>>   assertion failed: (obj->parent != NULL)
>>>>  Aborted (core dumped)
>>>>
>>>>  tests/test-hmp.c | 60 
>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>>>  1 file changed, 59 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/tests/test-hmp.c b/tests/test-hmp.c
>>>> index 7a38cdc..e8a25c4 100644
>>>> --- a/tests/test-hmp.c
>>>> +++ b/tests/test-hmp.c
>>>> @@ -28,7 +28,6 @@ static const char *hmp_cmds[] = {
>>>>      "commit all",
>>>>      "cpu-add 1",
>>>>      "cpu 0",
>>>> -    "device_add ?",
>>>>      "device_add usb-mouse,id=mouse1",
>>>>      "mouse_button 7",
>>>>      "mouse_move 10 10",
>>>> @@ -116,6 +115,64 @@ static void test_info_commands(void)
>>>>      g_free(info_buf);
>>>>  }
>>>>  
>>>> +/*
>>>> + * People tend to forget to mark internal devices with "user_creatable = 
>>>> false
>>>> + * and these devices can crash QEMU if added via the HMP monitor. So 
>>>> let's run
>>>> + * through all devices and try to add them blindly (without arguments) to 
>>>> see
>>>> + * whether this could crash the QEMU instance.
>>>> + */
>>>> +static void test_device_add_commands(void)
>>>> +{
>>>> +    char *resp, *devices, *devices_buf, *endp;
>>>> +
>>>> +    devices = devices_buf = hmp("device_add help");
>>>> +
>>>> +    while (*devices) {
>>>> +        /* Ignore header lines etc. */
>>>> +        if (strncmp(devices, "name \"", 6)) {
>>>> +            devices = strchr(devices, '\n');
>>>> +            if (!devices) {
>>>> +                break;
>>>> +            }
>>>> +            devices += 1;
>>>> +            continue;
>>>> +        }
>>>> +        /* Extract the device name, ignore parameters and description */
>>>> +        devices += 6;
>>>> +        endp = strchr(devices, '"');
>>>> +        g_assert(endp != NULL);
>>>> +        *endp = '\0';
>>>> +        /* Check whether it is blacklisted... */
>>>> +        if (g_str_equal("macio-oldworld", devices)) {
>>>> +            devices = strchr(endp + 1, '\n');
>>>> +            if (!devices) {
>>>> +                break;
>>>> +            }
>>>> +            devices += 1;
>>>> +            continue;
>>>> +        }
>>>> +        /* Now run the device_add + device_del commands */
>>>> +        if (verbose) {
>>>> +            fprintf(stderr, "\tdevice_add %s,id=%s\n", devices, devices);
>>>> +        }
>>>> +        resp = hmp("device_add %s,id=%s", devices, devices);
>>>> +        g_free(resp);
>>>> +        if (verbose) {
>>>> +            fprintf(stderr, "\tdevice_del %s\n", devices);
>>>> +        }
>>>> +        resp = hmp("device_del %s", devices);
>>>> +        g_free(resp);
>>>> +        /* And move forward to the next line */
>>>> +        devices = strchr(endp + 1, '\n');
>>>> +        if (!devices) {
>>>> +            break;
>>>> +        }
>>>> +        devices += 1;
>>>> +    }
>>>> +
>>>> +    g_free(devices_buf);
>>>> +}
>>>> +
>>>>  static void test_machine(gconstpointer data)
>>>>  {
>>>>      const char *machine = data;
>>>> @@ -125,6 +182,7 @@ static void test_machine(gconstpointer data)
>>>>      qtest_start(args);
>>>>  
>>>>      test_info_commands();
>>>> +    test_device_add_commands();
>>>>      test_commands();
>>>>  
>>>>      qtest_end();
>>>
>>> This finds devices by parsing output of HMP help.  I think you should
>>> use introspection instead, like device-introspect-test does.  You may
>>> want to extract common utility code from there.
>
> Well, I wrote a HMP test, to simulate what users can do at the HMP
> prompt. But ok, it's likely to crash QEMU also when using the QMP
> interface instead ... but then the code should also go into a different
> .c file, I think.

HMP device_add is a trivial wrapper around QMP, as is proper:

    void hmp_device_add(Monitor *mon, const QDict *qdict)
    {
        Error *err = NULL;

        qmp_device_add((QDict *)qdict, NULL, &err);
        hmp_handle_error(mon, &err);
    }

If anything ever breaks in this wrapper, it won't be specific to HMP
device_add.

>>> The actual device_add and device_del also use HMP.  Failures are
>>> ignored.  A few device_add failures I'd expect:
>>>
>>> * There is no suitable bus.
>>>
>>> * There are suitable buses, but the default one is full.
>
> You can partly work around the above two problems by looping a couple of
> times through the "device_add"s first, before doing the "device_del"s.
> Then the first iteration adds additional buses which get populated in
> the second iteration. I can get additional QEMU crashes if I modify my
> test that way... but currently I lack time for debugging all these
> crashes, so I don't want to include that in this patch here yet.

Kind of a random walk.

Eduardo has been working on "socket introspection", i.e. means to find
available "sockets" for devices to plug into.  Could be used to for a
more directed walk.  Eduardo, any thoughts?

>>> * Mandatory parameters are missing, such as device backend.
>
> That's quite hard to handle even with QMP, isn't it? How should a
> generic test know which parameter have to be populated with which value?

It could perhaps populate sufficiently generic parameters such as common
backends.

Sadly, device-list-properties is weak, and unless we improve or replace
it, this would involve somewhat ugly hardcoding of descriptions.  For
instance, we'd have to recognize from

    {"name": "netdev", "description": "ID of a netdev to use as a backend", 
"type": "str"}

that property netdev is a network backend, and very likely mandatory.

>>> * The bus doesn't support hot plug (some other bus might).
>
> That should not be a problem - at least QEMU should not crash in this
> situation.

Yes, and testing that has some value.  It doesn't test the device,
though, only the qdev core.

I don't mean to suggest your test is useless.  I'm merely pointing at
some gaping coverage holes :)

>>> * The device supports only cold plug with -device, not hot plug with
>>>   device_add.
>
> We've got Eduardo's scripts/device-crash-test script for that already,
> so no need to cover that here.

Point taken.  So this test is really about hot plug / unplug.  Suggest
to clarify the commit message: s/add them blindly/hotplug and unplug
them blindly/.

>>> I'm afraid the test only tests one of these common failure modes for
>>> many devices.
>>>
>>> device_del failures I'd expect:
>>>
>>> * The device doesn't exist, because it hasn't completed hot plug, yet.
>>>   In some cases such as ACPI PCI hot plug, hot plug may require guest
>>>   cooperation, which may take unbounded time.  device_add merely kicks
>>>   off the hot plug then.  I can't remember how to poll for completion.
>>>   I also can't remember why we don't send a QMP event.
>>>
>>>   The hot plug usually completes quickly, but it may take its own sweet
>>>   time, or not complete at all, say because the guest doesn't support
>>>   ACPI, or just doesn't feel like plugging right now.
>>>
>>>   The test needs to set up a guest that cooperates.  I guess that
>>>   involves libqos; yet another thing I've forgotten.
>
> That was certainly not my scope when I wrote this test. I just want to
> get rid of these devices that can easily crash QEMU if you just try to
> add or remove them at the monitor prompt. A more detailed hotplug test
> should IMHO be done by the individual device tests instead, like it is
> done in many tests/virtio*.c and tests/usb*.c files already.

I'm not trying to talk you into widening the scope of your test.  I am
pointing out that your testing of unplug is racy, and may therefore miss
failures randomly: if hotplug is slow, device_del won't actually do
anything interesting.

>>> * Same for device_del.  You should wait for the DEVICE_DELETED event
>>>   with a suitable timeout.
>> 
>> Yes, I think that's an interesting problem - although the test
>> might still be worth it just to see how many issues it finds;   I'm
>> curious how many devices actually work with no parameters though,
>> most seem to fail.
>
> My test already helped to find lots of problems:
>
> https://git.qemu.org/?p=qemu.git;a=commitdiff;h=f58f25599b72c7479e6a1
> https://git.qemu.org/?p=qemu.git;a=commitdiff;h=469f3da42ef4af347fa78
> https://git.qemu.org/?p=qemu.git;a=commitdiff;h=574ee06de9c4fe63c90be
> https://git.qemu.org/?p=qemu.git;a=commitdiff;h=84ebd3e8c7d4fe955b359
> https://git.qemu.org/?p=qemu.git;a=commitdiff;h=0d4fa4996fc5ee56ea7d0
> https://git.qemu.org/?p=qemu.git;a=commitdiff;h=1f98e55385d11da1dc0de
> https://git.qemu.org/?p=qemu.git;a=commitdiff;h=8ccccff9dd7ba24c7a788
> https://git.qemu.org/?p=qemu.git;a=commitdiff;h=0479097859372a760843a
> https://git.qemu.org/?p=qemu.git;a=commitdiff;h=a808c0865b720e22ca292
> https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg04618.html
> https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg04164.html
> https://lists.gnu.org/archive/html/qemu-arm/2017-08/msg00306.html
> https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg04622.html
> https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg04635.html
> https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg05390.html
>
> ... so does it now sound at least a little bit usable?

I don't doubt it is, but its limitations need to be understood and
either relaxed or documented.

>> If I'm reading the code right it's creating the device with the same
>> name as the device;  I wonder if that always works?
>
> Why not? The id is just an arbitrary string, isn't it?

Since you're using HMP, you get to quote ',', which occurs in some
device names[*].  Enjoy!  ;-P

Picking IDs that aren't anti-social may be easier.

>> But still,  it should mean that if the previous hotplug hasn't really
>> happened then you can move onto the next add.
>> 
>>> We could improve device_add to cold plug before the machine starts,
>>> i.e. between start with -S and the first cont.  We could similarly
>>> improve device_del to cold plug.  Together, that would let you sidestep
>>> the hot plug complications.
>>>
>>> I guess this test is a case of "if it was easy, we would've done it
>>> already"...
>> 
>> Still maybe it's worth it as a start.
>
> Agreed that we should finally move to a smarter, QMP-based test. But
> until someone wrote that, maybe we could include this as a temporary
> guard to avoid that problems like the ones above creep in again?

I think its limitations need to be understood to a useful degree, and
either relaxed or documented.


[*] Blame IEEE 1275 device trees.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]