[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] dump-guest-memory: Use BQL to protect dump finalize process
From: |
Peter Xu |
Subject: |
Re: [PATCH] dump-guest-memory: Use BQL to protect dump finalize process |
Date: |
Tue, 16 Nov 2021 20:59:34 +0800 |
On Tue, Nov 16, 2021 at 12:20:37PM +0100, Laszlo Ersek wrote:
> On 11/16/21 04:22, Peter Xu wrote:
> > When finalizing the dump-guest-memory with detached mode, we'll first set
> > dump
> > status to either FAIL or COMPLETE before doing the cleanup, however right
> > after
> > the dump status change it's possible that another dump-guest-memory qmp
> > command
> > is sent so both the main thread and dump thread (which is during cleanup)
> > could
> > be accessing dump state in paralell. That's racy.
> >
> > Fix it by protecting the finalizing phase of dump-guest-memory using BQL as
> > well. To do that, we expand the BQL from dump_cleanup() into
> > dump_process(),
> > so we will take the BQL longer than before. The critical section must cover
> > the status switch from ACTIVE->{FAIL|COMPLETE} so as to make sure there's no
> > race any more.
> >
> > We can also just introduce a specific mutex to serialize the dump process,
> > but
> > BQL should be enough for now so far, not to mention vm_start() in
> > dump_cleanup
> > will need BQL anyway, so maybe easier to just use the same mutex.
> >
> > Reported-by: Laszlo Ersek <lersek@redhat.com>
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> > dump/dump.c | 24 ++++++++++++++++++------
> > 1 file changed, 18 insertions(+), 6 deletions(-)
> >
> > diff --git a/dump/dump.c b/dump/dump.c
> > index 662d0a62cd..196b7b8ab9 100644
> > --- a/dump/dump.c
> > +++ b/dump/dump.c
> > @@ -96,13 +96,7 @@ static int dump_cleanup(DumpState *s)
> > g_free(s->guest_note);
> > s->guest_note = NULL;
> > if (s->resume) {
> > - if (s->detached) {
> > - qemu_mutex_lock_iothread();
> > - }
> > vm_start();
> > - if (s->detached) {
> > - qemu_mutex_unlock_iothread();
> > - }
> > }
> > migrate_del_blocker(dump_migration_blocker);
> >
> > @@ -1873,6 +1867,11 @@ static void dump_process(DumpState *s, Error **errp)
> > Error *local_err = NULL;
> > DumpQueryResult *result = NULL;
> >
> > + /*
> > + * When running with detached mode, these operations are not run with
> > BQL.
> > + * It's still safe, because it's protected by setting s->state to
> > ACTIVE,
>
> I think this is a typo: it should be s->status.
>
> > + * so dump_in_progress() check will block yet another
> > dump-guest-memory.
> > + */
> > if (s->has_format && s->format == DUMP_GUEST_MEMORY_FORMAT_WIN_DMP) {
> > #ifdef TARGET_X86_64
> > create_win_dump(s, &local_err);
> > @@ -1883,6 +1882,15 @@ static void dump_process(DumpState *s, Error **errp)
> > create_vmcore(s, &local_err);
> > }
> >
> > + /*
> > + * Serialize the finalizing of dump process using BQL to make sure no
> > + * concurrent access to DumpState is allowed. BQL is also required for
> > + * dump_cleanup as vm_start() needs it.
> > + */
> > + if (s->detached) {
> > + qemu_mutex_lock_iothread();
> > + }
> > +
> > /* make sure status is written after written_size updates */
> > smp_wmb();
> > qatomic_set(&s->status,
> > @@ -1898,6 +1906,10 @@ static void dump_process(DumpState *s, Error **errp)
> >
> > error_propagate(errp, local_err);
> > dump_cleanup(s);
> > +
> > + if (s->detached) {
> > + qemu_mutex_unlock_iothread();
> > + }
> > }
> >
> > static void *dump_thread(void *data)
> >
>
> The detached dumping thread now runs dump_cleanup() with the BQL held, so:
>
> dump_thread()
> dump_process()
> [ dump status is now FAILED or COMPLETED ]
> dump_cleanup()
> vm_start()
> [ runstate is now "running" I guess? ]
> migrate_del_blocker()
> g_slist_remove(migration_blockers) <------ read-modify-write #1
>
> is now called under the BQL's protection.
>
> Assuming a new dump request is issued in parallel over QMP, we have (on
> another thread -- the main thread I guess?):
>
> qmp_dump_guest_memory()
> [ dumping *not* in progress ]
> migrate_add_blocker_internal()
> [ runstate is *not* RUN_STATE_SAVE_VM ]
> g_slist_prepend(migration_blockers) <------- RMW #2
>
> and this is not protected by any *explicit* acquiral of the BQL.
>
> I know very little of the BQL, unfortunately. *IF* your argument is that
> qmp_dump_guest_memory() is entered with the BQL *already held*, then the
> patch is fine, IMO. Because in that case, during the short while that
> the detached dumping thread is cleaning up, the main thread (?) cannot
> acquire the BQL, and so it cannot enter qmp_dump_guest_memory() at all.
> If that's your point,
Yes that's the point. qmp_dump_guest_memory() must be with BQL held. For
example, dump_init() will call vm_start() if VM is running:
if (runstate_is_running()) {
vm_stop(RUN_STATE_SAVE_VM);
s->resume = true;
} else {
s->resume = false;
}
And vm_stop() needs BQL. The same to vm_start() if detached mode is not
enabled. If we call qmp_dump_guest_memory() without BQL that'll be a very
severe issue, imho.
> then:
>
> Reviewed-by: Laszlo Ersek <lersek@redhat.com>
>
> *Otherwise*, I don't understand how the patch helps protecting the
> "migration_blockers" object. (Because, although RMW#1 is now protected,
> RMW#2 is not.)
I took it for granted that BQL needs to be taken for qmp_dump_guest_memory()
without explicitly explaining. I'll refine the commit message and mention that
in v2.
Thanks,
--
Peter Xu