Re: [Qemu-devel] Migration ToDo list (a.k.a. Rant)

From: Li, Liang Z
Subject: Re: [Qemu-devel] Migration ToDo list (a.k.a. Rant)
Date: Thu, 5 May 2016 08:01:43 +0000

> From: Qemu-devel [mailto:qemu-devel-
> address@hidden On Behalf Of Juan Quintela
> Sent: Wednesday, May 04, 2016 7:20 PM
> To: QEMU Developer
> Subject: [Qemu-devel] Migration ToDo list (a.k.a. Rant)
> Hi
> I am lots of times asked about what is the ToDo list for migration, that was 
> on
> my head, and random notes over my desk, so, trying some organization (Yes,
> I would put this in the wiki).

Is it proper to add:  'speed up live migration by skipping free pages' ?


> - migration thread on reception
>   would make trivial to do other things while receiving, and would make
>   postcopy easier also (I was going to put much easier, but postcopy is
>   never easy).
> - migration capabilities and parameters
>   this is a mess.  Not, is worse than that.  I don't know who is to
>   blame here, but something needs to be done:
>      void qmp_migrate_set_parameters(bool has_compress_level,
>                                 int64_t compress_level,
>                                 bool has_compress_threads,
>                                 int64_t compress_threads,
>                                 bool has_decompress_threads,
>                                 int64_t decompress_threads,
>                                 bool has_x_cpu_throttle_initial,
>                                 int64_t x_cpu_throttle_initial,
>                                 bool has_x_cpu_throttle_increment,
>                                 int64_t x_cpu_throttle_increment,
>                                 bool has_multifd_threads,
>                                 int64_t multifd_threads,
>                                 Error **errp)
>     Can we move this to an array of structs, please, pretty please?
>     I think that for this one, the blame is on qmp
>    but we can continue:
>    migrate
>    migrate_cancel
>    migrate_incoming
>    migrate_start_postcopy
>       Not a lot to do until here
>    migrate_set_capability
>       Minor nickpit, if it only allow booleans, "migrate_set_capability 
> x-multifd",
>       should be an equivalent of "migrate_set_capability x-multifd on"
>    migrate_set_cache_size
>    migrate_set_downtime
>    migrate_set_speed
>       This three should be claimed obsolete, deprecated, whatever, and
>       make it on top of next one
>    migrate_set_parameter
>    Now to read the migration information:
>      migrate_capabilities
>        good
>      migrate_parameters
>        good
>      migrate_cache_size
>        good, but we are missing migrate_speed and migrate_downtime, see
>        why I want it be inside migrate_set_parameters
>      migrate
>        now, this is ..... weird?  We put here lots of information, and
>        this is basically the only way to put information out.  To make
>        things more interesting, the values change meaning during
>        migration, and the fields it shows change also over time.
> - info migrate
>   This deserves its own item.  Lets see a typical output
> (qemu)info migrate
> capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: 
> off
> compress: off events: off postcopy-ram: off x-multifd: on
>    Aha, we have the capabilities, but not the parameters.  This is
>    historical, I know, but don't belong here.
> Migration status: completed
>    ok
> total time: 1621 milliseconds
>    ok
> downtime: 208 milliseconds
>    ok
> setup: 9 milliseconds
>    ok
> transferred ram: 609708 kbytes
>    kilo bytes, not pages
> throughput: 27.64 mbps
>    but we measure bandwidth is megabytes by second
>    previous one was kylobytes
> remaining ram: 0 kbytes
> total ram: 2106180 kbytes
>    this amount don't change.  I can understand why it was here.
> duplicate: 452528 pages
>    name is historical.  It really means pages filled with the same
>    characeter.  Althought in practical effects it means zero pages
> skipped: 0 pages
>    Even I don't remember what this means.
> normal: 151064 pages
>    This is normal pages that we have sent, i.e. pages that are not zero
>    pages nor skipped pages.  Notice that we have put here pages, not
>    bytes, not kilobytes, but pages.
> normal bytes: 604256 kbytes
>    Don't worry, we put for you the same number as kilobytes.
> dirty sync count: 11
>    Number of iterations over the full ram.  Yes, I know, we are very,
>    very bad at naming.
> And we still have more optional information that appears if we are doing
> block migration, xbzrle, compression, rdma, etc, etc.
> We need to decide some units also internal.  Some things are in bytes, some
> are in kilobytes, some are in pages.  Some are in host pages, or guest pages,
> or who knows :-(
> - Block migration (the migration/block.c one).  This is the bastard
>   child of migration.  Much less tested, we should make a decision
>   about letting it live or deprecating it.  Things needed from memory:
>      - functions should return the same values than ram.c
>        some functions don't have "exact" values, and return 1 when there
>        are more than one block dirty, etc, etc
>      - if we continue maintaing it, allowing it to have _some_ shared
>        devices and some non shared ones, insntead of everything?
> - RDMA: Another step child
>   This is really, really weird.  We don't use the normal infrastructure
>   for RDMA, we use the ram_control_* stuff.  We should really move to
>   use the normal stuff here.
> - autoconverge code:  This could be used outside of migration (i.e. just
>   to slow down a guess).  We should really do some measurement here to
>   see how useful it is for migration.  If the guest is using lots of
>   memory dirtying, we end having to throttle the guest 90% or so :-(
> - xbzrle.  We only have one cache, we should decide how to work with
>   this for multithread/compression.
> - When we do migration, we have spaguetti code to decide if:
>   * it is a zero page
>   * it is a duplicated page
>   * it is a xbzrle page
>   * it is a compressed page
>   And as the code is written, it is not trivial to add new "options".  I
>   think that we should "re-think" what combinations are allowed an which
>   ones make nosense.
> - savevm and migration: they use two different paths for not really good
>   reason.  We should really abstract this to a single code path.
>   We always forget the savevm one when we do changes.
> - error handling.  Every function should return an error.  Every
>   function should return an error.
> - qemu_get_buffer() don't give one error if there is nothing to read,
>   sniff.
> - Multipage support: Welcome to the XXI century.  Now almost all
>   architectures have HugePages.  And other have different sized pages
>   (in PPC is not strange that page size of host and guest differ).  We
>   have work to do here.  For starters, sending Huge pages as one chunk
>   will make TransparentHugePages happier.
> - Bitmaps.  Related with previous one.  We should really be better about
>   walking them and about synchronising them between qemu/kernel.
> - COLO: We need to integrate it.
> I will continue the rant at some other point O:-)  Just now I need to left for
> the bar.
> Thanks for your attention, Juan.
> PD.  I just looked while I wrote this to the channel code from Daniel, a step
> on the right direction.

