[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_d

From: Peter Xu
Subject: Re: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty()
Date: Fri, 9 Jul 2021 10:48:00 -0400

On Fri, Jul 09, 2021 at 08:58:08AM +0000, Wang, Wei W wrote:
> On Friday, July 9, 2021 2:31 AM, Peter Xu wrote:
> > > > Yes I think this is the place I didn't make myself clear.  It's not
> > > > about sleeping, it's about the cmpxchg being expensive already when the 
> > > > vm
> > is huge.
> > >
> > > OK.
> > > How did you root cause that it's caused by cmpxchg, instead of lock
> > > contention (i.e. syscall and sleep) or some other code inside
> > pthread_mutex_lock(). Do you have cycles about cmpxchg v.s. cycles of
> > pthread_mutex_lock()?
> > 
> > We've got "perf top -g" showing a huge amount of stacks lying in
> > pthread_mutex_lock().
> This only explains pthread_mutex_lock is the cause, not root caused to 
> cmpxchg.

I think that's enough already to prove we can move the lock elsewhere.

It's not really a heavy race between threads; it's the pure overhead we called
it too many times.  So it's not really a problem yet about "what type of lock
we should use" (mutex or spin lock) or "how this lock is implemented" (say,
whether using cmpxchg only or optimize using test + test-and-set, as that
sounds like a good optimization of pure test-and-set spinlocks) because the
lock is not busy at all.

If we have two busy threads contentioning on the lock for real, that'll be
another problem, imho.  It's not yet the case.

> > 
> > >
> > > I check the implementation of pthread_mutex_lock(). The code path for lock
> > acquire is long. QemuSpin looks more efficient.
> > > (probably we also don’t want migration thread to sleep in any case)
> > 
> > I didn't check it, but I really hoped it should be very close to a spinlock 
> > version
> > when it's not contended.  We should be careful on using spin locks in 
> > userspace,
> > e.g., with that moving clear log into critical section will be too much and 
> > actuall
> > close to "wrong", imho, because the kvm memslot lock inside is sleepable.
> OK, that might be a good reason to keep clear_map out of the lock.
> We also don’t want the lock holder to have more chances to go to sleep though 
> it is mutex.

Yes that looks fine to me, but that's really another topic.  It's not a
requirement either before spinlock is justified to be better than mutex here.

> > 
> > I think it's very fine to have migration thread sleep.  IMHO we need 
> > explicit
> > justification for each mutex to be converted to a spinlock in userspace.  
> > So far I
> > don't see it yet for this bitmap lock.
> What if the guest gets stopped and then the migration thread goes to sleep?

Isn't the balloon code run in a standalone iothread?  Why guest stopped can
stop migration thread?

> > 
> > Frankly I also don't know how spinlock would work reliably without being 
> > able to
> > disable scheduling, then could the thread got scheduled out duing taking the
> > spinlock?  Would another thread trying to take this lock spinning on this 
> > sleeping
> > task?
> Yes, and it needs good justification:
> If it's per-page spinlock, the granularity is very small, so it has very 
> little chance that a lock holder gets scheduled out as time slice uses up. 
> Even that happens once, it seems no big issues, just the waiter wastes some 
> CPU cycles, this is better than having the migration thread scheduled out by 
> mutex, I think.

Yes I believe there's good reason when Emilio introduced the lock, I didn't
measure it myself so I can't tell.  It's just that we need more justification
that this mutex is needed to be replaced with a spinlock.

> > 
> > >
> > > I think it's also better to see the comparison of migration throughput 
> > > data (i.e.
> > pages per second) in the following cases, before we make a decision:
> > > - per-page mutex
> > > - per-page spinlock
> > > - 50-ms mutex
> > 
> > I don't have the machines, so I can't do this.  I also see this as separate 
> > issues to
> > solve to use spinlock, as I said before I would prefer some justification 
> > first to use
> > it rather than blindly running tests and pick a patch with higher number.
> > 
> > I also hope we can reach a consensus that we fix one issue at a time.  This 
> > patch
> > already proves itself with some real workloads, I hope we can merge it 
> > first,
> > either with 50ms period or less.
> > 
> > Per-page locking is already against at least my instinction of how this 
> > should be
> > done; I just regreted I didn't see that an issue when reviewing the balloon 
> > patch
> > as I offered the r-b myself.  However I want to make it work as before then 
> > we
> > figure out a better approach on how the lock is taken, and which lock we 
> > should
> > use.  I don't see it a must to do all things in one patch.
> Sorry, it wasn't my intention to be a barrier to merging your patch.

No problem, and I appreciate for all the discussions.  It's just that as you
see I still prefer to keep it like this first.

free-page-hint is a great feature, though if we grep it in libvirt we found
that it's not yet enabled anywhere.  It means it's not used as a majority, yet.
Meanwhile the code path this patch changes happen for each migration that qemu
started.  What I'm afraid is we worry too much on what most people are not
using, during which we didn't really make the majority works better.

>From what I learned in the past few years, funnily "speed of migration" is
normally not what people care the most.  Issues are mostly with convergence and
being transparent to users using the VMs so they aren't even aware.

I admit this patch may not be the perfect one, but that's the reason I wanted
to do it in two steps as spinlock definitely needs more justification here,
which I can't provide myself, while this patch is verified to help.

> Just try to come up with better solutions if possible.
> - Option1 QemuSpin lock would reduce the lock acquiring overhead, but need to 
> test if that migration could converge;
> - Option2 conditional lock. We haven't see the test results if turning on 
> free page optimization with that per-page lock could still make your 2TB 
> guest migration converge.

Yes after I checked libvirt I am more confident it's not with free-page-hint,
and as I said I wonder why you think free-page-hint could help a heavy memory
usage guest at all, as I mentioned most memories are used.

If we want to further optimize it, I don't think it'll make sense to chooise
either option 1 or 2 but only to take them all, but it'll be great when it
comes who proposed that new change also compare that with current patch 50ms
even if sleepable, so I'm wondering whether that'll really make a measurable
difference, especially if we can also shrink that 50ms period too, which is
what I asked initially.

> Seems we lack resources for those tests right now. If you are urgent for a 
> decision to have it work first, I'm also OK with you to merge it.

No I can't merge it myself as I'm not the maintainer. :) I haven't received any
ack yet, so at least I'll need to see how Dave and Juan think.  It's just that
I don't think qemuspin could help much in this case, and I don't want to mess
up the issue too.

Thanks for being considerate.

Peter Xu

reply via email to

[Prev in Thread] Current Thread [Next in Thread]