Re: [PATCH v1 2/2] virtio-balloon: disallow postcopy with VIRTIO_BALLOON

From: David Hildenbrand
Subject: Re: [PATCH v1 2/2] virtio-balloon: disallow postcopy with VIRTIO_BALLOON_F_FREE_PAGE_HINT
Date: Thu, 8 Jul 2021 09:14:45 +0200
On 08.07.21 00:40, Peter Xu wrote:
On Wed, Jul 07, 2021 at 02:22:32PM -0700, Alexander Duyck wrote:
On Wed, Jul 7, 2021 at 1:08 PM Peter Xu <peterx@redhat.com> wrote:

On Wed, Jul 07, 2021 at 08:57:29PM +0200, David Hildenbrand wrote:
On 07.07.21 20:02, Peter Xu wrote:
On Wed, Jul 07, 2021 at 04:06:55PM +0200, David Hildenbrand wrote:
As it never worked properly, let's disable it via the postcopy notifier on
the destination. Trying to set "migrate_set_capability postcopy-ram on"
on the destination now results in "virtio-balloon: 'free-page-hint' does
not support postcopy Error: Postcopy is not supported".

Would it be possible to do this in reversed order?  Say, dynamically disable
free-page-hinting if postcopy capability is set when migration starts? Perhaps
it can also be re-enabled automatically when migration completes?

I remember that this might be quite racy. We would have to make sure that no
hinting happens before we enable the capability.

As soon as we messed with the dirty bitmap (during precopy), postcopy is no
longer safe. As noted in the patch, the only runtime alternative is to
disable postcopy as soon as we actually do clear a bit. Alternatively, we
could ignore any hints if the postcopy capability was enabled.

Logically migration capabilities are applied at VM starts, and these
capabilities should be constant during migration (I didn't check if there's a
hard requirement; easy to add that if we want to assure it), and in most cases
for the lifecycle of the vm.

Would it make sense to maybe just look at adding a postcopy value to
the PrecopyNotifyData that you could populate with
migration_in_postcopy() in precopy_notify()?

Should we check migrate_postcopy_ram() rather than migration_in_postcopy()?
Right, we care about the source only -- if postcopy could be started.

Then all you would need to do is check for that value and if it is set
you shut down the page hinting or don't start it since I suspect it
wouldn't likely add any value anyway since I would think flagging
unused pages doesn't add much value in a postcopy environment anyway.

We'd have to never kick it off right from the start as I explained previously. As soon as you messed with the bitmaps it's problematic.

Whatever we do, we have to make sure that a user cannot trick the system
into an inconsistent state. Like enabling hinting, starting migration, then
enabling the postcopy capability and kicking of postcopy. I did not check if
we allow for that, though.

We could turn free page hinting off when migration starts with postcopy-ram=on,
then re-enable it after migration finishes.  That looks very safe to me.  And I
don't even worry on user trying to mess it up - as that only put their own VM
at risk; that's mostly fine to me.

We wouldn't necessarily even need to really turn it off, just don't
start it. I wonder if we couldn't just get away with adding a check to
the existing virtio_balloon_free_page_hint_notify to see if we are in
the postcopy state there and just shut things down or not start them.

This makes me wonder whether qemu_guest_free_page_hint() should be called at
all on destination host when incoming postcopy migration is in progress.

It really shouldn't. And if it would currently happen, it would be due to issue 1. described in the patch description that will be fixed independently, such that hinting is completely done once running on the destination.

Right now the check migration_is_setup_or_active() should return true on
destination host, however I am not sure if that's necessary as we don't track
dirty at all there.

migration_is_setup_or_active(s->state) uses migrate_get_current(), which gives us the outgoing state (source) not the incoming state (destination).


