[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v3 2/6] monitor: flush qmp responses when CLOSED
From: |
Peter Xu |
Subject: |
Re: [Qemu-devel] [PATCH v3 2/6] monitor: flush qmp responses when CLOSED |
Date: |
Tue, 19 Jun 2018 12:35:51 +0800 |
User-agent: |
Mutt/1.10.0 (2018-05-17) |
On Fri, Jun 15, 2018 at 10:11:34AM +0200, Markus Armbruster wrote:
> Peter Xu <address@hidden> writes:
>
> > Previously we clean up the queues when we got CLOSED event. It was used
> > to make sure we won't send leftover replies/events of a old client to a
> > new client. Now this patch does an extra operation to flush the
> > response queue before cleaning up.
> >
> > In most cases, a QMP session will be based on a bidirectional channel (a
> > TCP port, for example, we read/write to the same socket handle), so in
> > port and out port of the backend chardev are fundamentally the same
> > port. In these cases, it does not really matter much on whether we'll
> > flush the response queue since flushing should possibly fail
>
> s/should possibly fail anyways/will fail anyway/
>
> > anyways (just imagine to write to a socket that has already closed).
> > However there can be cases where in & out ports of the QMP monitor's
> > backend chardev are separated. One true example:
>
> Suggest to drop "true", or say "Here's an example".
>
> > cat $QMP_COMMANDS | qemu -qmp stdio ... | filter_commands
> >
> > In this case, the backend is fd-typed, and it is connected to stdio
> > where in port is stdin and out port is stdout. Now if we drop all the
> > events on the response queue then filter_command process might miss some
> > events that it might expect. The thing is that, when stdin closes,
> > stdout might still be there alive!
> >
> > In practice, I encountered a very strange SHUTDOWN event missing when
>
> The event isn't strange, its missing is (sort of). Suggest to drop
> "very strange".
>
> > running test with iotest 087 with Out-Of-Band enabled. One condition
> > could be this (after "quit" command is executed and QEMU quits the main
> > loop):
>
> "could be" suggests this may or may not demonstrate the loss of the
> event. Misleading. "Here's one of the ways this can happen" would be
> clearer.
>
> >
> > 1. [main thread] QEMU queues one SHUTDOWN event into response queue
>
> s/one/a/
>
> >
> > 2. "cat" terminates (to distinguish it from the animal, I quote it).
> > Logically it can terminate even earlier, but let's just assume it's
> > here.
>
> Suggest to drop the second sentence. You're describing one way for the
> bug to bite here, not all possible ways.
>
> >
> > 3. [monitor iothread] QEMU reads EOF from stdin, which connects to the
> > "cat" process
>
> Suggest "QEMU's monitor iothread reads".
>
> Suggest to drop ", which connects...", because that should be obvious.
>
> > 4. [monitor iothread] QEMU calls the CLOSED event hook for the monitor,
>
> "QEMU's monitor iothread" again.
>
> > which will clean up the response queue of the monitor, then the
> > SHUTDOWN event is dropped
>
> Suggest to say "destroy" instead of "clean up", to make perfectly clear
> that we're throwing away the queue.
> >
> > 5. [main thread] clean up the monitors in monitor_cleanup(), when
>
> "QEMU's main thread cleans up the monitors in monitor_cleanup(). When"
>
> > trying to flush pending responses, it sees nothing. SHUTDOWN is
> > lost forever
>
> Missing a period.
>
> > Note that before the monitor iothread was introduced, step [4]/[5] could
> > never happen since the main loop was the only place to detect the EOF
> > event of stdin and run the CLOSED event hooks. Now things can happen in
> > parallel in the iothread.
> >
> > Without this patch, iotest 087 will have ~10% chance to miss the
> > SHUTDOWN event and fail when with Out-Of-Band enabled:
> >
> > 087 8s ... - output mismatch (see 087.out.bad)
> > --- /home/peterx/git/qemu/tests/qemu-iotests/087.out 2018-06-01
> > 18:44:22.378982462 +0800
> > +++ /home/peterx/git/qemu/bin/tests/qemu-iotests/087.out.bad 2018-06-01
> > 18:53:44.267840928 +0800
> > @@ -8,7 +8,6 @@
> > {"return": {}}
> > {"error": {"class": "GenericError", "desc": "'node-name' must be specified
> > for the root node"}}
> > {"return": {}}
> > -{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP},
> > "event": "SHUTDOWN", "data": {"guest": false}}
> >
> > === Duplicate ID ===
> > @@ -53,7 +52,6 @@
> > {"return": {}}
> > {"return": {}}
> > {"return": {}}
> > -{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP},
> > "event": "SHUTDOWN", "data": {"guest": false}}
>
> Suggest to indent the quoted diff a bit, so make it more obviously not
> part of the patch. In fact, git-am chokes on it for me.
>
> > This patch fixes the problem.
> >
> > Fixes: 6d2d563f8c ("qmp: cleanup qmp queues properly", 2018-03-27)
> > Suggested-by: Markus Armbruster <address@hidden>
> > Signed-off-by: Peter Xu <address@hidden>
> >
> > Signed-off-by: Peter Xu <address@hidden>
>
> Don't let my nitpicking deceive you; this is a really nice commit
> message.
Thanks. I'll apply all the suggestions until here in my next post.
>
> One more suggestion. Your first paragraph describes the solution. I
> like to start with the problem instead. More so when the problem
> description is long, as it is here, because by the time I'm done reading
> the problem, I'm prone to have forgotten the solution, and go "okay, and
> how's the patch addressing the problem?" But please use your judgement.
Yes, talking about the problem first makes sense. But I'll try to
make it even easier for readers: I'll summarize the problem and
solution first, then I'll keep the rest of the commit message as
detailed description of the problem. So I'll rewrite my first
paragraph as:
Previously we clean up the queues when we got CLOSED event. It was
used to make sure we won't send leftover replies/events of a old
client to a new client which makes perfect sense. However this will
also drop the replies/events even if the output port of the previous
chardev backend is still open, which can lead to missing of the last
replies/events. Now this patch does an extra operation to flush the
response queue before cleaning up.
Hope this works.
>
> > ---
> > monitor.c | 30 +++++++++++++++++++++++++-----
> > 1 file changed, 25 insertions(+), 5 deletions(-)
> >
> > diff --git a/monitor.c b/monitor.c
> > index 6d0cec552e..e59d4f09ac 100644
> > --- a/monitor.c
> > +++ b/monitor.c
> > @@ -512,20 +512,39 @@ struct QMPResponse {
> > };
> > typedef struct QMPResponse QMPResponse;
> >
> > +static QObject *monitor_qmp_response_pop_one(Monitor *mon)
> > +{
> > + QObject *data;
> > +
> > + qemu_mutex_lock(&mon->qmp.qmp_queue_lock);
> > + data = g_queue_pop_head(mon->qmp.qmp_responses);
> > + qemu_mutex_unlock(&mon->qmp.qmp_queue_lock);
> > +
> > + return data;
> > +}
>
> Note for later: returns null only when the queue is empty (we never push
> null onto this queue).
>
> > +
> > +static void monitor_qmp_response_flush(Monitor *mon)
> > +{
> > + QObject *data;
> > +
> > + while ((data = monitor_qmp_response_pop_one(mon))) {
> > + monitor_json_emitter_raw(mon, data);
> > + qobject_unref(data);
> > + }
> > +}
>
> This flushes the queue to @mon. Good.
>
> > +
> > /*
> > * Return one QMPResponse. The response is only valid if
> > * response.data is not NULL.
> > */
> > -static QMPResponse monitor_qmp_response_pop_one(void)
> > +static QMPResponse monitor_qmp_response_pop(void)
> > {
> > Monitor *mon;
> > QObject *data = NULL;
> >
> > qemu_mutex_lock(&monitor_lock);
> > QTAILQ_FOREACH(mon, &mon_list, entry) {
> > - qemu_mutex_lock(&mon->qmp.qmp_queue_lock);
> > - data = g_queue_pop_head(mon->qmp.qmp_responses);
> > - qemu_mutex_unlock(&mon->qmp.qmp_queue_lock);
> > + data = monitor_qmp_response_pop_one(mon);
> > if (data) {
> > break;
> > }
> }
> qemu_mutex_unlock(&monitor_lock);
> return (QMPResponse) { .mon = mon, .data = data };
> }
>
> Not this patch's fault, but this function gave me pause until I got the
> loop's purpose: the function returns a response from *any* monitor.
>
> Moreover, your naming creates an inconsistency with
> monitor_qmp_requests_pop_one().
>
> I append my attempt to make it easier to understand, and restore
> consistency. I guess I'd split it into a preparatory patch and a fixup
> for this one.
Will do.
>
> > @@ -539,7 +558,7 @@ static void monitor_qmp_bh_responder(void *opaque)
> > QMPResponse response;
> >
> > while (true) {
> > - response = monitor_qmp_response_pop_one();
> > + response = monitor_qmp_response_pop();
> > if (!response.data) {
> > break;
> > }
> > @@ -4366,6 +4385,7 @@ static void monitor_qmp_event(void *opaque, int event)
> > mon_refcount++;
> > break;
> > case CHR_EVENT_CLOSED:
> > + monitor_qmp_response_flush(mon);
>
> This wastes work in the common case when @mon's output fd has been
> closed already. I think that's just fine.
Yeah. Since I'll repost, I consider to add some comment for that.
>
> > monitor_qmp_cleanup_queues(mon);
> > json_message_parser_destroy(&mon->qmp.parser);
> > json_message_parser_init(&mon->qmp.parser, handle_qmp_command);
>
>
> diff --git a/monitor.c b/monitor.c
> index e59d4f09ac..af28977549 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -512,7 +512,7 @@ struct QMPResponse {
> };
> typedef struct QMPResponse QMPResponse;
>
> -static QObject *monitor_qmp_response_pop_one(Monitor *mon)
> +static QObject *monitor_qmp_requests_pop(Monitor *mon)
I guess it's a typo; I'll keep the "response" word.
> {
> QObject *data;
>
> @@ -527,41 +527,39 @@ static void monitor_qmp_response_flush(Monitor *mon)
> {
> QObject *data;
>
> - while ((data = monitor_qmp_response_pop_one(mon))) {
> + while ((data = monitor_qmp_requests_pop(mon))) {
Same here.
> monitor_json_emitter_raw(mon, data);
> qobject_unref(data);
> }
> }
>
> /*
> - * Return one QMPResponse. The response is only valid if
> - * response.data is not NULL.
> + * Pop a QMPResponse from any monitor's response queue into @response.
> + * Return NULL when all queues are empty, else @response.
I'll change the return value to a boolean directly if it's okay, since
after all we'll pass the QMPResponse in now.
I'll put the rest of the content into a standalone patch.
Thanks,
> */
> -static QMPResponse monitor_qmp_response_pop(void)
> +static QMPResponse *monitor_qmp_response_pop_any(QMPResponse *response)
> {
> Monitor *mon;
> QObject *data = NULL;
>
> qemu_mutex_lock(&monitor_lock);
> QTAILQ_FOREACH(mon, &mon_list, entry) {
> - data = monitor_qmp_response_pop_one(mon);
> + data = monitor_qmp_requests_pop(mon);
> if (data) {
> + response->mon = mon;
> + response->data = data;
> break;
> }
> }
> qemu_mutex_unlock(&monitor_lock);
> - return (QMPResponse) { .mon = mon, .data = data };
> + return data ? response : NULL;
> }
>
> static void monitor_qmp_bh_responder(void *opaque)
> {
> QMPResponse response;
>
> - while (true) {
> - response = monitor_qmp_response_pop();
> - if (!response.data) {
> - break;
> - }
> + while (monitor_qmp_response_pop_any(&response)) {
> monitor_json_emitter_raw(response.mon, response.data);
> qobject_unref(response.data);
> }
> @@ -4126,7 +4124,7 @@ static void monitor_qmp_dispatch_one(QMPRequest
> *req_obj)
> * when we process one request on a specific monitor, we put that
> * monitor to the end of mon_list queue.
> */
> -static QMPRequest *monitor_qmp_requests_pop_one(void)
> +static QMPRequest *monitor_qmp_requests_pop_any(void)
> {
> QMPRequest *req_obj = NULL;
> Monitor *mon;
> @@ -4158,7 +4156,7 @@ static QMPRequest *monitor_qmp_requests_pop_one(void)
>
> static void monitor_qmp_bh_dispatcher(void *data)
> {
> - QMPRequest *req_obj = monitor_qmp_requests_pop_one();
> + QMPRequest *req_obj = monitor_qmp_requests_pop_any();
>
> if (req_obj) {
> trace_monitor_qmp_cmd_in_band(qobject_get_try_str(req_obj->id) ?:
> "");
--
Peter Xu
- [Qemu-devel] [PATCH v3 0/6] monitor: enable OOB by default, Peter Xu, 2018/06/14
- [Qemu-devel] [PATCH v3 3/6] tests: iotests: drop some stderr line, Peter Xu, 2018/06/14
- [Qemu-devel] [PATCH v3 4/6] docs: mention shared state protect for OOB, Peter Xu, 2018/06/14
- [Qemu-devel] [PATCH v3 5/6] monitor: remove "x-oob", turn oob on by default, Peter Xu, 2018/06/14
- [Qemu-devel] [PATCH v3 6/6] Revert "tests: Add parameter to qtest_init_without_qmp_handshake", Peter Xu, 2018/06/14