qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] monitor/qmp: fix race with clients disconnecting early


From: Stefan Reiter
Subject: Re: [PATCH] monitor/qmp: fix race with clients disconnecting early
Date: Wed, 13 Oct 2021 17:44:31 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0

On 10/12/21 11:27 AM, Markus Armbruster wrote:
Stefan, any thoughts on this?


Sorry, I didn't get to work on implementing this. Idea 3 does seem very
reasonable, though I suppose the question is what all should go into the
per-session state, and also how it is passed down.

We did roll out my initial patch to our users in the meantime and got
some positive feedback (that specific error disappeared), however another
one (reported at the same time) still exists, so I was trying to diagnose
- it might also turn out to be monitor related and resolved by the more
thorough fix proposed here, but unsure.

Markus Armbruster <armbru@redhat.com> writes:

I've thought a bit more.

A monitor can serve a series of clients.

Back when all of the monitor ran in the main thread, we completely
finished serving the current client before we started serving the next
one (I think).  In other words, sessions did not overlap.

Since we moved parts of the monitor to the monitor I/O thread (merge
commit 4bdc24fa018), sessions can overlap, and this causes issues, as
you demonstrated.

Possible fixes:

1. Go back to "don't overlap" with suitable synchronization.

    I'm afraid this won't cut it, because exec-oob would have wait in
    line behind reconnect.

    It currently waits for other reasons, but that feels fixable.  Going
    back to "don't overlap" would make it unfixable.

2. Make the lingering session not affect / be affected by the new
    session's state

    This is what your patch tries.  Every access of session state needs
    to be guarded like

         if (conn_nr_before == qatomic_read(&mon->connection_nr)) {
             access session state
         } else {
             ???
         }

    Issues:

    * We have to find and guard all existing accesses.  That's work.

    * We have to guard all future accesses.  More work, and easy to
      forget, which makes this fix rather brittle.

    * The fix is spread over many places.

    * We may run into cases where the ??? part gets complicated.
      Consider file descriptors.  The command in flight will have its
      monitor_get_fd() fail after disconnect.  Probably okay, because it
      can also fail for other reasons.  But we might run into cases where
      the ??? part creates new failure modes for us to handle.

3. Per-session state

    Move per-session state from monitor state into a separate object.

    Use reference counts to keep this object alive until both threads are
    done with the session.

    Monitor I/O thread executes monitor core and the out-of-band
    commands; its stops using the old session on disconnect, and starts
    using the new session on connect.

    Main thread executes in-band commands, and these use the session that
    submitted them.

    Do I make sense, or should I explain my idea in more detail?






reply via email to

[Prev in Thread] Current Thread [Next in Thread]