qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] dump-guest-memory: Use BQL to protect dump finalize process


From: Peter Xu
Subject: Re: [PATCH] dump-guest-memory: Use BQL to protect dump finalize process
Date: Tue, 16 Nov 2021 20:59:34 +0800

On Tue, Nov 16, 2021 at 12:20:37PM +0100, Laszlo Ersek wrote:
> On 11/16/21 04:22, Peter Xu wrote:
> > When finalizing the dump-guest-memory with detached mode, we'll first set 
> > dump
> > status to either FAIL or COMPLETE before doing the cleanup, however right 
> > after
> > the dump status change it's possible that another dump-guest-memory qmp 
> > command
> > is sent so both the main thread and dump thread (which is during cleanup) 
> > could
> > be accessing dump state in paralell.  That's racy.
> > 
> > Fix it by protecting the finalizing phase of dump-guest-memory using BQL as
> > well.  To do that, we expand the BQL from dump_cleanup() into 
> > dump_process(),
> > so we will take the BQL longer than before.  The critical section must cover
> > the status switch from ACTIVE->{FAIL|COMPLETE} so as to make sure there's no
> > race any more.
> > 
> > We can also just introduce a specific mutex to serialize the dump process, 
> > but
> > BQL should be enough for now so far, not to mention vm_start() in 
> > dump_cleanup
> > will need BQL anyway, so maybe easier to just use the same mutex.
> > 
> > Reported-by: Laszlo Ersek <lersek@redhat.com>
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  dump/dump.c | 24 ++++++++++++++++++------
> >  1 file changed, 18 insertions(+), 6 deletions(-)
> > 
> > diff --git a/dump/dump.c b/dump/dump.c
> > index 662d0a62cd..196b7b8ab9 100644
> > --- a/dump/dump.c
> > +++ b/dump/dump.c
> > @@ -96,13 +96,7 @@ static int dump_cleanup(DumpState *s)
> >      g_free(s->guest_note);
> >      s->guest_note = NULL;
> >      if (s->resume) {
> > -        if (s->detached) {
> > -            qemu_mutex_lock_iothread();
> > -        }
> >          vm_start();
> > -        if (s->detached) {
> > -            qemu_mutex_unlock_iothread();
> > -        }
> >      }
> >      migrate_del_blocker(dump_migration_blocker);
> >  
> > @@ -1873,6 +1867,11 @@ static void dump_process(DumpState *s, Error **errp)
> >      Error *local_err = NULL;
> >      DumpQueryResult *result = NULL;
> >  
> > +    /*
> > +     * When running with detached mode, these operations are not run with 
> > BQL.
> > +     * It's still safe, because it's protected by setting s->state to 
> > ACTIVE,
> 
> I think this is a typo: it should be s->status.
> 
> > +     * so dump_in_progress() check will block yet another 
> > dump-guest-memory.
> > +     */
> >      if (s->has_format && s->format == DUMP_GUEST_MEMORY_FORMAT_WIN_DMP) {
> >  #ifdef TARGET_X86_64
> >          create_win_dump(s, &local_err);
> > @@ -1883,6 +1882,15 @@ static void dump_process(DumpState *s, Error **errp)
> >          create_vmcore(s, &local_err);
> >      }
> >  
> > +    /*
> > +     * Serialize the finalizing of dump process using BQL to make sure no
> > +     * concurrent access to DumpState is allowed.  BQL is also required for
> > +     * dump_cleanup as vm_start() needs it.
> > +     */
> > +    if (s->detached) {
> > +        qemu_mutex_lock_iothread();
> > +    }
> > +
> >      /* make sure status is written after written_size updates */
> >      smp_wmb();
> >      qatomic_set(&s->status,
> > @@ -1898,6 +1906,10 @@ static void dump_process(DumpState *s, Error **errp)
> >  
> >      error_propagate(errp, local_err);
> >      dump_cleanup(s);
> > +
> > +    if (s->detached) {
> > +        qemu_mutex_unlock_iothread();
> > +    }
> >  }
> >  
> >  static void *dump_thread(void *data)
> > 
> 
> The detached dumping thread now runs dump_cleanup() with the BQL held, so:
> 
> dump_thread()
>   dump_process()
>     [ dump status is now FAILED or COMPLETED ]
>     dump_cleanup()
>       vm_start()
>       [ runstate is now "running" I guess? ]
>       migrate_del_blocker()
>         g_slist_remove(migration_blockers) <------ read-modify-write #1
> 
> is now called under the BQL's protection.
> 
> Assuming a new dump request is issued in parallel over QMP, we have (on
> another thread -- the main thread I guess?):
> 
> qmp_dump_guest_memory()
>   [ dumping *not* in progress ]
>   migrate_add_blocker_internal()
>     [ runstate is *not* RUN_STATE_SAVE_VM ]
>     g_slist_prepend(migration_blockers) <------- RMW #2
> 
> and this is not protected by any *explicit* acquiral of the BQL.
> 
> I know very little of the BQL, unfortunately. *IF* your argument is that
> qmp_dump_guest_memory() is entered with the BQL *already held*, then the
> patch is fine, IMO. Because in that case, during the short while that
> the detached dumping thread is cleaning up, the main thread (?) cannot
> acquire the BQL, and so it cannot enter qmp_dump_guest_memory() at all.
> If that's your point,

Yes that's the point.  qmp_dump_guest_memory() must be with BQL held.  For
example, dump_init() will call vm_start() if VM is running:

    if (runstate_is_running()) {
        vm_stop(RUN_STATE_SAVE_VM);
        s->resume = true;
    } else {
        s->resume = false;
    }

And vm_stop() needs BQL.  The same to vm_start() if detached mode is not
enabled.  If we call qmp_dump_guest_memory() without BQL that'll be a very
severe issue, imho.

> then:
> 
> Reviewed-by: Laszlo Ersek <lersek@redhat.com>
> 
> *Otherwise*, I don't understand how the patch helps protecting the
> "migration_blockers" object. (Because, although RMW#1 is now protected,
> RMW#2 is not.)

I took it for granted that BQL needs to be taken for qmp_dump_guest_memory()
without explicitly explaining.  I'll refine the commit message and mention that
in v2.

Thanks,

-- 
Peter Xu




reply via email to

[Prev in Thread] Current Thread [Next in Thread]