qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PULL 0/8] Monitor patches for 2018-10-30


From: Markus Armbruster
Subject: Re: [Qemu-devel] [PULL 0/8] Monitor patches for 2018-10-30
Date: Tue, 20 Nov 2018 18:44:27 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux)

Peter Xu <address@hidden> writes:

> On Mon, Nov 19, 2018 at 07:08:13PM +0100, Markus Armbruster wrote:
>> Peter Xu <address@hidden> writes:
>> 
>> > On Mon, Nov 19, 2018 at 02:17:27PM +0800, Peter Xu wrote:
>> >> I reproduced the error with a FreeBSD guest and this change (which
>> >> possibly can be squashed into "tests: qmp-test: add queue full test")
>> >> worked for me:
>> >> 
>> >> diff --git a/tests/qmp-test.c b/tests/qmp-test.c
>> >> index 6989acbca4..83f353db4f 100644
>> >> --- a/tests/qmp-test.c
>> >> +++ b/tests/qmp-test.c
>> >> @@ -281,8 +281,15 @@ static void test_qmp_oob(void)
>> >>       * will only be able to be handled after the queue is shrinked, so
>> >>       * it'll be processed only after one existing in-band command
>> >>       * finishes.
>> >> +     *
>> >> +     * NOTE: we need to feed the queue with one extra request to make
>> >> +     * sure it'll be stuck since when we have sent the Nth request
>> >> +     * it's possible that we have already popped and processing the
>> >> +     * 1st request so the Nth request (which could potentially be the
>> >> +     * [N-1]th element on the queue) might not trigger the
>> >> +     * monitor-full condition deterministically.
>> >>       */
>> >> -    for (i = 1; i <= QMP_REQ_QUEUE_LEN_MAX; i++) {
>> >> +    for (i = 1; i <= QMP_REQ_QUEUE_LEN_MAX + 1; i++) {
>> >>          id = g_strdup_printf("queue-blocks-%d", i);
>> >>          send_cmd_that_blocks(qts, id);
>> >>          g_free(id);
>> >> @@ -291,7 +298,7 @@ static void test_qmp_oob(void)
>> >>      unblock_blocked_cmd();
>> >>      recv_cmd_id(qts, "queue-blocks-1");
>> >>      recv_cmd_id(qts, "oob-1");
>> >> -    for (i = 2; i <= QMP_REQ_QUEUE_LEN_MAX; i++) {
>> >> +    for (i = 2; i <= QMP_REQ_QUEUE_LEN_MAX + 1; i++) {
>> >>          unblock_blocked_cmd();
>> >>          id = g_strdup_printf("queue-blocks-%d", i);
>> >>          recv_cmd_id(qts, id);
>> >> 
>> >> So the problem here is that the queue-block-N command might not really
>> >> suspend the monitor everytime if we already popped the 1st request,
>> >> which will let the N-th request to be (N-1)th, then the parser will
>> >> continue to eat the oob command and it could "preempt" the previous
>> >> commands.
>> >> 
>> >> Maybe FreeBSD is scheduling the threads in some pattern so it happens
>> >> only on FreeBSD and very constantly, but anyway it should be a general
>> >> fix to the test program.
>> 
>> I feel particularly dense right now, and need a more detailed analysis
>> to understand.  Let me try.
>> 
>> Here's what the test does before your fix
>> 
>>     Send QMP_REQ_QUEUE_LEN_MAX "blocking" in-band commands
>>     queue-blocks-1, queue-blocks-2, ...  "Blocking" means the command
>>     blocks until we explicitly unblock it.
>> 
>>     Send an out-of-band command oob-1.
>> 
>>     Unblock queue-blocks-1.
>> 
>>     Receive response to queue-blocks-1.
>> 
>>     Receive response to oob-1.
>> 
>>     Unblock queue-blocks-2, receive response
>>     ...
>> 
>> Here's what test test expects QEMU to do:
>> 
>>     In I/O thread:
>> 
>>         Receive and queue queue-blocks-1, ...
>> 
>>         Queue is full, stop receiving commands.
>> 
>>     In main thread:
>> 
>>         Dequeue queue-blocks-1.
>> 
>>         Execute queue-blocks-1, block for a while, finish and send
>>         response.
>> 
>>     In I/O thread:
>> 
>>         Queue is no longer full, resume receiving commands.
>> 
>>         Receive and execute oob-1, send response
>> 
>>     In main thread:
>> 
>>         Dequeue queue-blocks-2, execute, block for a while, finish and
>>         send response.
>>         ...
>> 
>> No good because the two threads race.  On my Linux box the test passes.
>> Let's have a closer look how it does that.  The appended patch that adds
>> another tracepoint.  With tracepoints monitor_* and handle_qmp_command
>> enabled, I get:
>> 
>>     address@hidden:handle_qmp_command mon 0x563e19be4e80 req: {"execute": 
>> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": 
>> "queue-blocks-1", "driver": "blkdebug", "config": 
>> "/tmp/qmp-test-EZroW3/fifo"}}
>> 
>> I/O thread receives and queues queue-blocks-1.
>> 
>>     address@hidden:monitor_qmp_cmd_in_band queue-blocks-1
>> 
>> Main thread dequeues qemu-blocks-1 and starts executing it.
>> 
>> We already lost: the queue won't actually fill up.  The test passes
>> anyway, but it doesn't actually test "queue full".  But why does it
>> pass?  Let's see.
>> 
>>     address@hidden:monitor_qmp_respond queue-blocks-1
>> 
>> Main thread sends response to queue-blocks-1.  This means queue-blocks-1
>> has been unblocked already.  This is possible, because qmp-test sends
>> all the commands and then immediately unblocks queue-blocks-1, and the
>> main thread can win the race with the I/O thread.
>> 
>>     address@hidden:handle_qmp_command mon 0x563e19be4e80 req: {"execute": 
>> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": 
>> "queue-blocks-2", "driver": "blkdebug", "config": 
>> "/tmp/qmp-test-EZroW3/fifo"}}
>>     address@hidden:monitor_qmp_cmd_in_band queue-blocks-2
>>     address@hidden:handle_qmp_command mon 0x563e19be4e80 req: {"execute": 
>> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": 
>> "queue-blocks-3", "driver": "blkdebug", "config": 
>> "/tmp/qmp-test-EZroW3/fifo"}}
>>     address@hidden:handle_qmp_command mon 0x563e19be4e80 req: {"execute": 
>> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": 
>> "queue-blocks-4", "driver": "blkdebug", "config": 
>> "/tmp/qmp-test-EZroW3/fifo"}}
>>     address@hidden:handle_qmp_command mon 0x563e19be4e80 req: {"execute": 
>> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": 
>> "queue-blocks-5", "driver": "blkdebug", "config": 
>> "/tmp/qmp-test-EZroW3/fifo"}}
>>     address@hidden:handle_qmp_command mon 0x563e19be4e80 req: {"execute": 
>> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": 
>> "queue-blocks-6", "driver": "blkdebug", "config": 
>> "/tmp/qmp-test-EZroW3/fifo"}}
>>     address@hidden:handle_qmp_command mon 0x563e19be4e80 req: {"execute": 
>> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": 
>> "queue-blocks-7", "driver": "blkdebug", "config": 
>> "/tmp/qmp-test-EZroW3/fifo"}}
>>     address@hidden:handle_qmp_command mon 0x563e19be4e80 req: {"execute": 
>> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": 
>> "queue-blocks-8", "driver": "blkdebug", "config": 
>> "/tmp/qmp-test-EZroW3/fifo"}}
>> 
>> I/O thread receives and queues queue-blocks-2, ... queue-blocks-8.
>> 
>>     address@hidden:handle_qmp_command mon 0x563e19be4e80 req: {"exec-oob": 
>> "migrate-pause"}
>>     address@hidden:monitor_qmp_cmd_out_of_band oob-1
>>     address@hidden:monitor_qmp_respond oob-1
>> 
>> I/O thread receives, executes and responds to oob-1.
>> 
>>     address@hidden:monitor_qmp_respond queue-blocks-2
>>     address@hidden:monitor_qmp_cmd_in_band queue-blocks-3
>>     address@hidden:monitor_qmp_respond queue-blocks-3
>>     address@hidden:monitor_qmp_cmd_in_band queue-blocks-4
>>     address@hidden:monitor_qmp_respond queue-blocks-4
>>     address@hidden:monitor_qmp_cmd_in_band queue-blocks-5
>>     address@hidden:monitor_qmp_respond queue-blocks-5
>>     address@hidden:monitor_qmp_cmd_in_band queue-blocks-6
>>     address@hidden:monitor_qmp_respond queue-blocks-6
>>     address@hidden:monitor_qmp_cmd_in_band queue-blocks-7
>>     address@hidden:monitor_qmp_respond queue-blocks-7
>>     address@hidden:monitor_qmp_cmd_in_band queue-blocks-8
>>     address@hidden:monitor_qmp_respond queue-blocks-8
>> 
>> Main thread dequeues, executes and responds to queue-blocks-2,
>> ... queue-blocks-8.
>> 
>> If I make the main thread lose the race by sticking a sleep(1) before
>> the unblock_blocked_cmd() that unblocks queue-blocks-1, I get:
>> 
>>     address@hidden:handle_qmp_command mon 0x55f5b6a5de80 req: {"execute": 
>> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": 
>> "queue-blocks-1", "driver": "blkdebug", "config": 
>> "/tmp/qmp-test-PqO2uz/fifo"}}
>> 
>> I/O thread receives and queues queue-blocks-1.
>> 
>>     address@hidden:monitor_qmp_cmd_in_band queue-blocks-1
>> 
>> Main thread dequeues qemu-blocks-1 and starts executing it.  We already
>> lost.
>> 
>>     address@hidden:handle_qmp_command mon 0x55f5b6a5de80 req: {"execute": 
>> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": 
>> "queue-blocks-2", "driver": "blkdebug", "config": 
>> "/tmp/qmp-test-PqO2uz/fifo"}}
>>     address@hidden:handle_qmp_command mon 0x55f5b6a5de80 req: {"execute": 
>> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": 
>> "queue-blocks-3", "driver": "blkdebug", "config": 
>> "/tmp/qmp-test-PqO2uz/fifo"}}
>>     address@hidden:handle_qmp_command mon 0x55f5b6a5de80 req: {"execute": 
>> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": 
>> "queue-blocks-4", "driver": "blkdebug", "config": 
>> "/tmp/qmp-test-PqO2uz/fifo"}}
>>     address@hidden:handle_qmp_command mon 0x55f5b6a5de80 req: {"execute": 
>> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": 
>> "queue-blocks-5", "driver": "blkdebug", "config": 
>> "/tmp/qmp-test-PqO2uz/fifo"}}
>>     address@hidden:handle_qmp_command mon 0x55f5b6a5de80 req: {"execute": 
>> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": 
>> "queue-blocks-6", "driver": "blkdebug", "config": 
>> "/tmp/qmp-test-PqO2uz/fifo"}}
>>     address@hidden:handle_qmp_command mon 0x55f5b6a5de80 req: {"execute": 
>> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": 
>> "queue-blocks-7", "driver": "blkdebug", "config": 
>> "/tmp/qmp-test-PqO2uz/fifo"}}
>>     address@hidden:handle_qmp_command mon 0x55f5b6a5de80 req: {"execute": 
>> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": 
>> "queue-blocks-8", "driver": "blkdebug", "config": 
>> "/tmp/qmp-test-PqO2uz/fifo"}}
>> 
>> I/O thread receives and queues queue-blocks-2, ... queue-blocks-8.
>> 
>>     address@hidden:handle_qmp_command mon 0x55f5b6a5de80 req: {"exec-oob": 
>> "migrate-pause"}
>>     address@hidden:monitor_qmp_cmd_out_of_band oob-1
>>     address@hidden:monitor_qmp_respond oob-1
>> 
>> I/O thread receives, executes and responds to oob-1.
>> 
>> Test is going to fail: response to oob-1 sent before response to
>> queue-blocks-1.
>> 
>> Now let me try your fix.  Here's what I see:
>> 
>>     address@hidden:handle_qmp_command mon 0x5597b8e0de80 req: {"execute": 
>> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": 
>> "queue-blocks-1", "driver": "blkdebug", "config": 
>> "/tmp/qmp-test-8YRbOJ/fifo"}}
>>     address@hidden:monitor_qmp_cmd_in_band queue-blocks-1
>>     address@hidden:monitor_qmp_respond queue-blocks-1
>>     address@hidden:handle_qmp_command mon 0x5597b8e0de80 req: {"execute": 
>> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": 
>> "queue-blocks-2", "driver": "blkdebug", "config": 
>> "/tmp/qmp-test-8YRbOJ/fifo"}}
>>     address@hidden:monitor_qmp_cmd_in_band queue-blocks-2
>>     address@hidden:handle_qmp_command mon 0x5597b8e0de80 req: {"execute": 
>> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": 
>> "queue-blocks-3", "driver": "blkdebug", "config": 
>> "/tmp/qmp-test-8YRbOJ/fifo"}}
>>     address@hidden:handle_qmp_command mon 0x5597b8e0de80 req: {"execute": 
>> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": 
>> "queue-blocks-4", "driver": "blkdebug", "config": 
>> "/tmp/qmp-test-8YRbOJ/fifo"}}
>>     address@hidden:handle_qmp_command mon 0x5597b8e0de80 req: {"execute": 
>> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": 
>> "queue-blocks-5", "driver": "blkdebug", "config": 
>> "/tmp/qmp-test-8YRbOJ/fifo"}}
>>     address@hidden:handle_qmp_command mon 0x5597b8e0de80 req: {"execute": 
>> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": 
>> "queue-blocks-6", "driver": "blkdebug", "config": 
>> "/tmp/qmp-test-8YRbOJ/fifo"}}
>>     address@hidden:handle_qmp_command mon 0x5597b8e0de80 req: {"execute": 
>> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": 
>> "queue-blocks-7", "driver": "blkdebug", "config": 
>> "/tmp/qmp-test-8YRbOJ/fifo"}}
>>     address@hidden:handle_qmp_command mon 0x5597b8e0de80 req: {"execute": 
>> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": 
>> "queue-blocks-8", "driver": "blkdebug", "config": 
>> "/tmp/qmp-test-8YRbOJ/fifo"}}
>>     address@hidden:handle_qmp_command mon 0x5597b8e0de80 req: {"execute": 
>> "blockdev-add", "arguments": {"image": {"driver": "null-co"}, "node-name": 
>> "queue-blocks-9", "driver": "blkdebug", "config": 
>> "/tmp/qmp-test-8YRbOJ/fifo"}}
>>     address@hidden:handle_qmp_command mon 0x5597b8e0de80 req: {"exec-oob": 
>> "migrate-pause"}
>>     address@hidden:monitor_qmp_cmd_out_of_band oob-1
>>     address@hidden:monitor_qmp_respond oob-1
>>     address@hidden:monitor_qmp_respond queue-blocks-2
>>     address@hidden:monitor_qmp_cmd_in_band queue-blocks-3
>>     address@hidden:monitor_qmp_respond queue-blocks-3
>>     address@hidden:monitor_qmp_cmd_in_band queue-blocks-4
>>     address@hidden:monitor_qmp_respond queue-blocks-4
>>     address@hidden:monitor_qmp_cmd_in_band queue-blocks-5
>>     address@hidden:monitor_qmp_respond queue-blocks-5
>>     address@hidden:monitor_qmp_cmd_in_band queue-blocks-6
>>     address@hidden:monitor_qmp_respond queue-blocks-6
>>     address@hidden:monitor_qmp_cmd_in_band queue-blocks-7
>>     address@hidden:monitor_qmp_respond queue-blocks-7
>>     address@hidden:monitor_qmp_cmd_in_band queue-blocks-8
>>     address@hidden:monitor_qmp_respond queue-blocks-8
>>     address@hidden:monitor_qmp_cmd_in_band queue-blocks-9
>>     address@hidden:monitor_qmp_respond queue-blocks-9
>> 
>> Tracepoint monitor_suspend still not hit.  This means the queue does not
>> fill up.
>
> You are right.  I think I figured out one thing but I missed another
> point here that "sent the command" does not mean "received by the QMP
> server": we can never identify when the queue is full, meanwhile
> that's what we need to write a unit test for it.
>
> For example: in this test what we want to achieve is:
>
>   for i in 1 .. N:
>       send blocking commands Bi
>   ... wait until queue is full ...  [1]
>   send oob command O1
>   unblock blocking command B1
>   receive response B1
>   receive response O1
>   for i in 2 .. N:
>       unblock blocking command Bi
>       receive response Bi
>
> All looks sane except the assumption at [1].  In the code we didn't do
> anything for [1] assuming that it's naturally happening.  However it's
> not since when finishing the first for loop we only queued the
> commands into the send buffer, but we can't guarantee that the
> commands are well received and further queued in the qmp request list
> on the server side.  Now when the server qmp request queue gets full
> it will only suspend the monitor, and that event is totally unseen
> from the client POV, so IMHO we don't have a good way to guarantee
> this.  E.g., we can do a very long sleep(100) at [1] to pretty make
> sure the queue full is happened, however it might still break on a
> super slow machine.
>
> In other words, I don't see a good way to automate the testing of
> queue full for now (but I'm pretty sure the queue full is working
> there since if you really insert a sleep you'll definitely see the
> monitor_suspend tracepoint).  We might need something like an event to
> make sure the queue full happened and then the client can detect this
> and insert the next out-of-band command, but then we're going backword
> and I don't see much point to introduce a new event only for testing
> purpose...

We want the test to drive QEMU into the "queue full" state with an
out-of-band command unreceived, then watch the queue drain and the
out-of-band command overtaking part of it.

Driving QEMU into this state is easy enough: we can use a blocking
in-band command to plug the queue's drain.

Once we have QEMU in this state, we can unblock that command and watch
the show.  But we really have to wait until QEMU reaches this state
before we unblock.

Our problem is the test can't easily observe when QEMU reaches this
state.

You mentioned sending a "queue full" event, and that adding such an
event just for testing is hard to justify.  I agree.  Could such an
event have a non-testing use?  All I can think of is letting the
management application detect "we mismanaged the queue and compromised
monitor availability for out-of-band commands".  Doesn't sound
particularly convincing to me, but we can talk with libvirt guys to see
whether they want it.

Can the test observe the monitor is suspended just from the behavior of
its monitor socket?  Not reliably: sending lots of input eventually
fails with EAGAIN when the monitor is suspended, but it can also fail
when the monitor reads the input too slowly.

Can the test observe the monitor_suspend trace point?  Awkward, because
the trace backends are configurable.  We could have the test monitor the
"simple" backend's output file, skipping the test when "simple" isn't
available.  Hmm.

Here comes the heavy hammer: extend the qtest protocol.  When the
monitor suspends / resumes, send a suitable asynchronous message to the
qtest socket.  Have libqtest use that to track monitor suspend state.
The test can then busy-wait for the state to flip to suspended.

Other ideas?

>> > Markus, do you want me to repost a new version with this change?  Is
>> > it still possible to have the oob-default series for 3.1?
>> 
>> We're trying to, but we need to get the test to work.
>> 
>> Two problems:
>> 
>> 1. The test doesn't seem to succed at testing "queue full".
>> 
>> 2. The test might still be racy.
>> 
>> I suspect the code we test is solid, and it's "only" the test we screwed
>> up.
>
> I agree.  Do you have better idea to test queue full?  I'd be more
> than glad to try any suggestions here, otherwise I am thinking whether
> we can just simply drop the queue full test (which is the last patch
> of the oob series) for now but merge the rest.  We can introduce new
> queue full test after 3.1 if we want, though again I really doubt
> whether we can without new things introduced into the procedure.
>
> Thoughts?

I'm afraid dropping the test (for now?) is what we have to do.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]