On 21 July 2017 at 10:13, Dr. David Alan Gilbert <address@hidden> wrote:
I don't fully understand the way memory_region_do_invalidate_mmio_ptr
works; I see it dropping the memory region; if that's also dropping
the RAMBlock then it will upset migration. Even if the CPU is stopped
I dont think that stops the migration thread walking through the list of
RAMBlocks.
memory_region_do_invalidate_mmio_ptr() calls memory_region_unref(),
which will eventually result in memory_region_finalize() being
called, which will call the MR destructor, which in this case is
memory_region_destructor_ram(), which calls qemu_ram_free() on
the RAMBlock, which removes the RAMBlock from the list (after
taking the ramlist lock).
Even then, the problem is migration keeps a 'dirty_pages' count which is
calculated at the start of migration and updated as we dirty and send
pages; if we add/remove a RAMBlock then that dirty_pages count is wrong
and we either never finish migration (since dirty_pages never reaches
zero) or finish early with some unsent data.
And then there's the 'received' bitmap currently being added for
postcopy which tracks each page that's been received (that's not in yet
though).
It sounds like we really need to make migration robust against
RAMBlock changes -- in the hotplug case it's certainly possible
for RAMBlocks to be newly created or destroyed while migration
is in progress.