[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] safety of migration_bitmap_extend
From: |
Dr. David Alan Gilbert |
Subject: |
Re: [Qemu-devel] safety of migration_bitmap_extend |
Date: |
Tue, 3 Nov 2015 13:47:17 +0000 |
User-agent: |
Mutt/1.5.24 (2015-08-30) |
* Juan Quintela (address@hidden) wrote:
> "Dr. David Alan Gilbert" <address@hidden> wrote:
> > Hi,
> > I'm trying to understand why migration_bitmap_extend is correct/safe;
> > If I understand correctly, you're arguing that:
> >
> > 1) the migration_bitmap_mutex around the extend, stops any sync's
> > happening
> > and so no new bits will be set during the extend.
> >
> > 2) If migration sends a page and clears a bitmap entry, it doesn't
> > matter if we lose the 'clear' because we're copying it as
> > we extend it, because losing the clear just means the page
> > gets resent, and so the data is OK.
> >
> > However, doesn't (2) mean that migration_dirty_pages might be wrong?
> > If a page was sent, the bit cleared, and migration_dirty_pages decremented,
> > then if we copy over that bitmap and 'set' that bit again then
> > migration_dirty_pages
> > is too small; that means that either migration would finish too early,
> > or more likely, migration_dirty_pages would wrap-around -ve and
> > never finish.
> >
> > Is there a reason it's really safe?
>
> No. It is reasonably safe. Various values of reasonably.
>
> migration_dirty_pages should never arrive at values near zero. Because
> we move to the completion stage way before it gets a value near zero.
> (We could have very, very bad luck, as in it is not safe).
That's only true if we hit the qemu_file_rate_limit() in ram_save_iterate;
if we don't hit the rate limit (e.g. because we're CPU or network limited
to slower than the set limit) then I think ram_save_iterate will go all the
way to sending every page; if that happens it'll go once more
around the main migration loop, and call the pending routine, and now get
a -ve (very +ve) number of pending pages, so continuously do ram_save_iterate
again.
We've had that type of bug before when we messed up the dirty-pages calculation
during hotplug.
> Now, do we really care if migration_dirty_pages is exact? Not really,
> we just use it to calculate if we should start the throotle or not.
> That only test that each 1 second, so if we have written a couple of
> pages that we are not accounting for, things should be reasonably safe.
>
> Once told that, I don't know why we didn't catch that problem during
> review (yes, I am guilty here). Not sure how to really fix it,
> thought. I think that the problem is more theoretical than real, but
Dave
> ....
>
> Thanks, Juan.
>
> >
> > Dave
> >
> > --
> > Dr. David Alan Gilbert / address@hidden / Manchester, UK
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK
- [Qemu-devel] safety of migration_bitmap_extend, Dr. David Alan Gilbert, 2015/11/03
- Re: [Qemu-devel] safety of migration_bitmap_extend, Juan Quintela, 2015/11/03
- Re: [Qemu-devel] safety of migration_bitmap_extend,
Dr. David Alan Gilbert <=
- Re: [Qemu-devel] safety of migration_bitmap_extend, Wen Congyang, 2015/11/03
- Re: [Qemu-devel] safety of migration_bitmap_extend, Dr. David Alan Gilbert, 2015/11/04
- Re: [Qemu-devel] safety of migration_bitmap_extend, Wen Congyang, 2015/11/04
- Re: [Qemu-devel] safety of migration_bitmap_extend, Dr. David Alan Gilbert, 2015/11/04
- Re: [Qemu-devel] safety of migration_bitmap_extend, Wen Congyang, 2015/11/12
- Re: [Qemu-devel] safety of migration_bitmap_extend, Li Zhijian, 2015/11/13