[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_d

From: Wang, Wei W
Subject: RE: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty()
Date: Fri, 9 Jul 2021 08:58:08 +0000

On Friday, July 9, 2021 2:31 AM, Peter Xu wrote:
> > > Yes I think this is the place I didn't make myself clear.  It's not
> > > about sleeping, it's about the cmpxchg being expensive already when the vm
> is huge.
> >
> > OK.
> > How did you root cause that it's caused by cmpxchg, instead of lock
> > contention (i.e. syscall and sleep) or some other code inside
> pthread_mutex_lock(). Do you have cycles about cmpxchg v.s. cycles of
> pthread_mutex_lock()?
> We've got "perf top -g" showing a huge amount of stacks lying in
> pthread_mutex_lock().

This only explains pthread_mutex_lock is the cause, not root caused to cmpxchg.

> >
> > I check the implementation of pthread_mutex_lock(). The code path for lock
> acquire is long. QemuSpin looks more efficient.
> > (probably we also don’t want migration thread to sleep in any case)
> I didn't check it, but I really hoped it should be very close to a spinlock 
> version
> when it's not contended.  We should be careful on using spin locks in 
> userspace,
> e.g., with that moving clear log into critical section will be too much and 
> actuall
> close to "wrong", imho, because the kvm memslot lock inside is sleepable.

OK, that might be a good reason to keep clear_map out of the lock.
We also don’t want the lock holder to have more chances to go to sleep though 
it is mutex.

> I think it's very fine to have migration thread sleep.  IMHO we need explicit
> justification for each mutex to be converted to a spinlock in userspace.  So 
> far I
> don't see it yet for this bitmap lock.

What if the guest gets stopped and then the migration thread goes to sleep?

> Frankly I also don't know how spinlock would work reliably without being able 
> to
> disable scheduling, then could the thread got scheduled out duing taking the
> spinlock?  Would another thread trying to take this lock spinning on this 
> sleeping
> task?

Yes, and it needs good justification:
If it's per-page spinlock, the granularity is very small, so it has very little 
chance that a lock holder gets scheduled out as time slice uses up. Even that 
happens once, it seems no big issues, just the waiter wastes some CPU cycles, 
this is better than having the migration thread scheduled out by mutex, I think.

> >
> > I think it's also better to see the comparison of migration throughput data 
> > (i.e.
> pages per second) in the following cases, before we make a decision:
> > - per-page mutex
> > - per-page spinlock
> > - 50-ms mutex
> I don't have the machines, so I can't do this.  I also see this as separate 
> issues to
> solve to use spinlock, as I said before I would prefer some justification 
> first to use
> it rather than blindly running tests and pick a patch with higher number.
> I also hope we can reach a consensus that we fix one issue at a time.  This 
> patch
> already proves itself with some real workloads, I hope we can merge it first,
> either with 50ms period or less.
> Per-page locking is already against at least my instinction of how this 
> should be
> done; I just regreted I didn't see that an issue when reviewing the balloon 
> patch
> as I offered the r-b myself.  However I want to make it work as before then we
> figure out a better approach on how the lock is taken, and which lock we 
> should
> use.  I don't see it a must to do all things in one patch.

Sorry, it wasn't my intention to be a barrier to merging your patch.
Just try to come up with better solutions if possible.
- Option1 QemuSpin lock would reduce the lock acquiring overhead, but need to 
test if that migration could converge;
- Option2 conditional lock. We haven't see the test results if turning on free 
page optimization with that per-page lock could still make your 2TB guest 
migration converge.

Seems we lack resources for those tests right now. If you are urgent for a 
decision to have it work first, I'm also OK with you to merge it.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]