qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2] migration: skip sending ram pages released b


From: Dr. David Alan Gilbert
Subject: Re: [Qemu-devel] [PATCH v2] migration: skip sending ram pages released by virtio-balloon driver.
Date: Thu, 31 Mar 2016 17:39:43 +0100
User-agent: Mutt/1.5.24 (2015-08-30)

* Jitendra Kolhe (address@hidden) wrote:
> While measuring live migration performance for qemu/kvm guest, it
> was observed that the qemu doesn’t maintain any intelligence for the
> guest ram pages which are released by the guest balloon driver and
> treat such pages as any other normal guest ram pages. This has direct
> impact on overall migration time for the guest which has released
> (ballooned out) memory to the host.

Hi Jitendra,
  I've read over the patch and I've got a mix of comments; I've not read
it in full detail:

   a) It does need splitting up; it's a bit big to review in one go;
      I suggest you split it into the main code, and separately the bitmap
      save/load.  It might be worth splitting it up even more.

   b) in balloon_bitmap_load check the next and len fields; since it's read
      over the wire we've got to treat them as hostile; so check they don't
      run over the length of the bitmap.

   c) The bitmap_load/save needs to be tied to the machine type or something
      that means that if you were migraitng in a stream from an older qemu
      it wouldn't get upset when it tried to read the extra data you read.
      I prefer if it's tied to either the config setting or the new machine
      type (that way backwards migration works as well).

   d) I agree with the other comments tha the stuff looking up the ram blocks
      addressing looks wrong;  you use last_ram_offset() to size the bitmap,
      so it makes me think it's the whole of the ram_addr_t; but I think you're
      saying you're not interested in all of it.   However remember that
      the order of ram_addr_t is not stable between two qemu's - even something
      like hotplugging a ethercard in one qemu vs having it on the command line 
on
      the other can change that order; so anything going over the wire has
      to be block+offset-into-block. Also remember that last_ram_offset() 
includes
      annoying things like firmware RAM, video memory and all those things;

   e) It should be possible to get it to work for postcopy if you just use
      it as a way to speed up the zero detection but still send the zero page
      messages.

Dave

> 
> In case of large systems, where we can configure large guests with 1TB
> and with considerable amount of memory release by balloon driver to the,
> host the migration time gets worse.
> 
> The solution proposed below is local only to qemu (and does not require
> any modification to Linux kernel or any guest driver). We have verified
> the fix for large guests =1TB on HPE Superdome X (which can support up
> to 240 cores and 12TB of memory) and in case where 90% of memory is
> released by balloon driver the migration time for an idle guests reduces
> to ~600 sec's from ~1200 sec’s.
> 
> Detail: During live migration, as part of 1st iteration in ram_save_iterate()
> -> ram_find_and_save_block () will try to migrate ram pages which are
> released by vitrio-balloon driver as part of dynamic memory delete.
> Even though the pages which are returned to the host by virtio-balloon
> driver are zero pages, the migration algorithm will still end up
> scanning the entire page ram_find_and_save_block() -> ram_save_page/
> ram_save_compressed_page -> save_zero_page() -> is_zero_range().  We
> also end-up sending some control information over network for these
> page during migration. This adds to total migration time.
> 
> The proposed fix, uses the existing bitmap infrastructure to create
> a virtio-balloon bitmap. The bits in the bitmap represent a guest ram
> page of size 1UL<< VIRTIO_BALLOON_PFN_SHIFT. The bitmap represents
> entire guest ram memory till max configured memory. Guest ram pages
> claimed by the virtio-balloon driver will be represented by 1 in the
> bitmap. During live migration, each guest ram page (host VA offset)
> is checked against the virtio-balloon bitmap, if the bit is set the
> corresponding ram page will be excluded from scanning and sending
> control information during migration. The bitmap is also migrated to
> the target as part of every ram_save_iterate loop and after the
> guest is stopped remaining balloon bitmap is migrated as part of
> balloon driver save / load interface.
> 
> With the proposed fix, the average migration time for an idle guest
> with 1TB maximum memory and 64vCpus
>  - reduces from ~1200 secs to ~600 sec, with guest memory ballooned
>    down to 128GB (~10% of 1TB).
>  - reduces from ~1300 to ~1200 sec (7%), with guest memory ballooned
>    down to 896GB (~90% of 1TB),
>  - with no ballooning configured, we don’t expect to see any impact
>    on total migration time.
> 
> The optimization gets temporarily disabled, if the balloon operation is
> in progress. Since the optimization skips scanning and migrating control
> information for ballooned out pages, we might skip guest ram pages in
> cases where the guest balloon driver has freed the ram page to the guest
> but not yet informed the host/qemu about the ram page
> (VIRTIO_BALLOON_F_MUST_TELL_HOST). In such case with optimization, we
> might skip migrating ram pages which the guest is using. Since this
> problem is specific to balloon leak, we can restrict balloon operation in
> progress check to only balloon leak operation in progress check.
> 
> The optimization also get permanently disabled (for all subsequent
> migrations) in case any of the migration uses postcopy capability. In case
> of postcopy the balloon bitmap would be required to send after vm_stop,
> which has significant impact on the downtime. Moreover, the applications
> in the guest space won’t be actually faulting on the ram pages which are
> already ballooned out, the proposed optimization will not show any
> improvement in migration time during postcopy.
> 
> Signed-off-by: Jitendra Kolhe <address@hidden>
> ---
> Changed in v2:
>  - Resolved compilation issue for qemu-user binaries in exec.c
>  - Localize balloon bitmap test to save_zero_page().
>  - Updated version string for newly added migration capability to 2.7.
>  - Made minor modifications to patch commit text.
> 
>  balloon.c                          | 253 
> ++++++++++++++++++++++++++++++++++++-
>  exec.c                             |   3 +
>  hw/virtio/virtio-balloon.c         |  35 ++++-
>  include/hw/virtio/virtio-balloon.h |   1 +
>  include/migration/migration.h      |   1 +
>  include/sysemu/balloon.h           |  15 ++-
>  migration/migration.c              |   9 ++
>  migration/ram.c                    |  31 ++++-
>  qapi-schema.json                   |   5 +-
>  9 files changed, 341 insertions(+), 12 deletions(-)
> 
> diff --git a/balloon.c b/balloon.c
> index f2ef50c..1c2d228 100644
> --- a/balloon.c
> +++ b/balloon.c
> @@ -33,11 +33,34 @@
>  #include "qmp-commands.h"
>  #include "qapi/qmp/qerror.h"
>  #include "qapi/qmp/qjson.h"
> +#include "exec/ram_addr.h"
> +#include "migration/migration.h"
> +
> +#define BALLOON_BITMAP_DISABLE_FLAG -1UL
> +
> +typedef enum {
> +    BALLOON_BITMAP_DISABLE_NONE = 1, /* Enabled */
> +    BALLOON_BITMAP_DISABLE_CURRENT,
> +    BALLOON_BITMAP_DISABLE_PERMANENT,
> +} BalloonBitmapDisableState;
>  
>  static QEMUBalloonEvent *balloon_event_fn;
>  static QEMUBalloonStatus *balloon_stat_fn;
> +static QEMUBalloonInProgress *balloon_in_progress_fn;
>  static void *balloon_opaque;
>  static bool balloon_inhibited;
> +static unsigned long balloon_bitmap_pages;
> +static unsigned int  balloon_bitmap_pfn_shift;
> +static QemuMutex balloon_bitmap_mutex;
> +static bool balloon_bitmap_xfered;
> +static unsigned long balloon_min_bitmap_offset;
> +static unsigned long balloon_max_bitmap_offset;
> +static BalloonBitmapDisableState balloon_bitmap_disable_state;
> +
> +static struct BitmapRcu {
> +    struct rcu_head rcu;
> +    unsigned long *bmap;
> +} *balloon_bitmap_rcu;
>  
>  bool qemu_balloon_is_inhibited(void)
>  {
> @@ -49,6 +72,21 @@ void qemu_balloon_inhibit(bool state)
>      balloon_inhibited = state;
>  }
>  
> +void qemu_mutex_lock_balloon_bitmap(void)
> +{
> +    qemu_mutex_lock(&balloon_bitmap_mutex);
> +}
> +
> +void qemu_mutex_unlock_balloon_bitmap(void)
> +{
> +    qemu_mutex_unlock(&balloon_bitmap_mutex);
> +}
> +
> +void qemu_balloon_reset_bitmap_data(void)
> +{
> +    balloon_bitmap_xfered = false;
> +}
> +
>  static bool have_balloon(Error **errp)
>  {
>      if (kvm_enabled() && !kvm_has_sync_mmu()) {
> @@ -65,9 +103,12 @@ static bool have_balloon(Error **errp)
>  }
>  
>  int qemu_add_balloon_handler(QEMUBalloonEvent *event_func,
> -                             QEMUBalloonStatus *stat_func, void *opaque)
> +                             QEMUBalloonStatus *stat_func,
> +                             QEMUBalloonInProgress *in_progress_func,
> +                             void *opaque, int pfn_shift)
>  {
> -    if (balloon_event_fn || balloon_stat_fn || balloon_opaque) {
> +    if (balloon_event_fn || balloon_stat_fn ||
> +        balloon_in_progress_fn || balloon_opaque) {
>          /* We're already registered one balloon handler.  How many can
>           * a guest really have?
>           */
> @@ -75,17 +116,39 @@ int qemu_add_balloon_handler(QEMUBalloonEvent 
> *event_func,
>      }
>      balloon_event_fn = event_func;
>      balloon_stat_fn = stat_func;
> +    balloon_in_progress_fn = in_progress_func;
>      balloon_opaque = opaque;
> +
> +    qemu_mutex_init(&balloon_bitmap_mutex);
> +    balloon_bitmap_disable_state = BALLOON_BITMAP_DISABLE_NONE;
> +    balloon_bitmap_pfn_shift = pfn_shift;
> +    balloon_bitmap_pages = (last_ram_offset() >> balloon_bitmap_pfn_shift);
> +    balloon_bitmap_rcu = g_new0(struct BitmapRcu, 1);
> +    balloon_bitmap_rcu->bmap = bitmap_new(balloon_bitmap_pages);
> +    bitmap_clear(balloon_bitmap_rcu->bmap, 0, balloon_bitmap_pages);
> +
>      return 0;
>  }
>  
> +static void balloon_bitmap_free(struct BitmapRcu *bmap)
> +{
> +    g_free(bmap->bmap);
> +    g_free(bmap);
> +}
> +
>  void qemu_remove_balloon_handler(void *opaque)
>  {
> +    struct BitmapRcu *bitmap = balloon_bitmap_rcu;
>      if (balloon_opaque != opaque) {
>          return;
>      }
> +    atomic_rcu_set(&balloon_bitmap_rcu, NULL);
> +    if (bitmap) {
> +        call_rcu(bitmap, balloon_bitmap_free, rcu);
> +    }
>      balloon_event_fn = NULL;
>      balloon_stat_fn = NULL;
> +    balloon_in_progress_fn = NULL;
>      balloon_opaque = NULL;
>  }
>  
> @@ -116,3 +179,189 @@ void qmp_balloon(int64_t target, Error **errp)
>      trace_balloon_event(balloon_opaque, target);
>      balloon_event_fn(balloon_opaque, target);
>  }
> +
> +/* Handle Ram hotplug case, only called in case old < new */
> +int qemu_balloon_bitmap_extend(ram_addr_t old, ram_addr_t new)
> +{
> +    struct BitmapRcu *old_bitmap = balloon_bitmap_rcu, *bitmap;
> +    unsigned long old_offset, new_offset;
> +
> +    if (!balloon_bitmap_rcu) {
> +        return -1;
> +    }
> +
> +    old_offset = (old >> balloon_bitmap_pfn_shift);
> +    new_offset = (new >> balloon_bitmap_pfn_shift);
> +
> +    bitmap = g_new(struct BitmapRcu, 1);
> +    bitmap->bmap = bitmap_new(new_offset);
> +
> +    qemu_mutex_lock_balloon_bitmap();
> +    bitmap_clear(bitmap->bmap, 0,
> +                 balloon_bitmap_pages + new_offset - old_offset);
> +    bitmap_copy(bitmap->bmap, old_bitmap->bmap, old_offset);
> +
> +    atomic_rcu_set(&balloon_bitmap_rcu, bitmap);
> +    balloon_bitmap_pages += new_offset - old_offset;
> +    qemu_mutex_unlock_balloon_bitmap();
> +    call_rcu(old_bitmap, balloon_bitmap_free, rcu);
> +
> +    return 0;
> +}
> +
> +/* Should be called with balloon bitmap mutex lock held */
> +int qemu_balloon_bitmap_update(ram_addr_t addr, int deflate)
> +{
> +    unsigned long *bitmap;
> +    unsigned long offset = 0;
> +
> +    if (!balloon_bitmap_rcu) {
> +        return -1;
> +    }
> +    offset = (addr >> balloon_bitmap_pfn_shift);
> +    if (balloon_bitmap_xfered) {
> +        if (offset < balloon_min_bitmap_offset) {
> +            balloon_min_bitmap_offset = offset;
> +        }
> +        if (offset > balloon_max_bitmap_offset) {
> +            balloon_max_bitmap_offset = offset;
> +        }
> +    }
> +
> +    rcu_read_lock();
> +    bitmap = atomic_rcu_read(&balloon_bitmap_rcu)->bmap;
> +    if (deflate == 0) {
> +        set_bit(offset, bitmap);
> +    } else {
> +        clear_bit(offset, bitmap);
> +    }
> +    rcu_read_unlock();
> +    return 0;
> +}
> +
> +void qemu_balloon_bitmap_setup(void)
> +{
> +    if (migrate_postcopy_ram()) {
> +        balloon_bitmap_disable_state = BALLOON_BITMAP_DISABLE_PERMANENT;
> +    } else if ((!balloon_bitmap_rcu || !migrate_skip_balloon()) &&
> +               (balloon_bitmap_disable_state !=
> +                BALLOON_BITMAP_DISABLE_PERMANENT)) {
> +        balloon_bitmap_disable_state = BALLOON_BITMAP_DISABLE_CURRENT;
> +    }
> +}
> +
> +int qemu_balloon_bitmap_test(RAMBlock *rb, ram_addr_t addr)
> +{
> +    unsigned long *bitmap;
> +    ram_addr_t base;
> +    unsigned long nr = 0;
> +    int ret = 0;
> +
> +    if (balloon_bitmap_disable_state == BALLOON_BITMAP_DISABLE_CURRENT ||
> +        balloon_bitmap_disable_state == BALLOON_BITMAP_DISABLE_PERMANENT) {
> +        return 0;
> +    }
> +    balloon_in_progress_fn(balloon_opaque, &ret);
> +    if (ret == 1) {
> +        return 0;
> +    }
> +
> +    rcu_read_lock();
> +    bitmap = atomic_rcu_read(&balloon_bitmap_rcu)->bmap;
> +    base = rb->offset >> balloon_bitmap_pfn_shift;
> +    nr = base + (addr >> balloon_bitmap_pfn_shift);
> +    if (test_bit(nr, bitmap)) {
> +        ret = 1;
> +    }
> +    rcu_read_unlock();
> +    return ret;
> +}
> +
> +int qemu_balloon_bitmap_save(QEMUFile *f)
> +{
> +    unsigned long *bitmap;
> +    unsigned long offset = 0, next = 0, len = 0;
> +    unsigned long tmpoffset = 0, tmplimit = 0;
> +
> +    if (balloon_bitmap_disable_state == BALLOON_BITMAP_DISABLE_PERMANENT) {
> +        qemu_put_be64(f, BALLOON_BITMAP_DISABLE_FLAG);
> +        return 0;
> +    }
> +
> +    qemu_mutex_lock_balloon_bitmap();
> +    if (balloon_bitmap_xfered) {
> +        tmpoffset = balloon_min_bitmap_offset;
> +        tmplimit  = balloon_max_bitmap_offset;
> +    } else {
> +        balloon_bitmap_xfered = true;
> +        tmpoffset = offset;
> +        tmplimit  = balloon_bitmap_pages;
> +    }
> +
> +    balloon_min_bitmap_offset = balloon_bitmap_pages;
> +    balloon_max_bitmap_offset = 0;
> +
> +    qemu_put_be64(f, balloon_bitmap_pages);
> +    qemu_put_be64(f, tmpoffset);
> +    qemu_put_be64(f, tmplimit);
> +    rcu_read_lock();
> +    bitmap = atomic_rcu_read(&balloon_bitmap_rcu)->bmap;
> +    while (tmpoffset < tmplimit) {
> +        unsigned long next_set_bit, start_set_bit;
> +        next_set_bit = find_next_bit(bitmap, balloon_bitmap_pages, 
> tmpoffset);
> +        start_set_bit = next_set_bit;
> +        if (next_set_bit == balloon_bitmap_pages) {
> +            len = 0;
> +            next = start_set_bit;
> +            qemu_put_be64(f, next);
> +            qemu_put_be64(f, len);
> +            break;
> +        }
> +        next_set_bit = find_next_zero_bit(bitmap,
> +                                          balloon_bitmap_pages,
> +                                          ++next_set_bit);
> +        len = (next_set_bit - start_set_bit);
> +        next = start_set_bit;
> +        qemu_put_be64(f, next);
> +        qemu_put_be64(f, len);
> +        tmpoffset = next + len;
> +    }
> +    rcu_read_unlock();
> +    qemu_mutex_unlock_balloon_bitmap();
> +    return 0;
> +}
> +
> +int qemu_balloon_bitmap_load(QEMUFile *f)
> +{
> +    unsigned long *bitmap;
> +    unsigned long next = 0, len = 0;
> +    unsigned long tmpoffset = 0, tmplimit = 0;
> +
> +    if (!balloon_bitmap_rcu) {
> +        return -1;
> +    }
> +
> +    qemu_mutex_lock_balloon_bitmap();
> +    balloon_bitmap_pages = qemu_get_be64(f);
> +    if (balloon_bitmap_pages == BALLOON_BITMAP_DISABLE_FLAG) {
> +        balloon_bitmap_disable_state = BALLOON_BITMAP_DISABLE_PERMANENT;
> +        qemu_mutex_unlock_balloon_bitmap();
> +        return 0;
> +    }
> +    tmpoffset = qemu_get_be64(f);
> +    tmplimit  = qemu_get_be64(f);
> +    rcu_read_lock();
> +    bitmap = atomic_rcu_read(&balloon_bitmap_rcu)->bmap;
> +    while (tmpoffset < tmplimit) {
> +        next = qemu_get_be64(f);
> +        len  = qemu_get_be64(f);
> +        if (len == 0) {
> +            break;
> +        }
> +        bitmap_set(bitmap, next, len);
> +        tmpoffset = next + len;
> +    }
> +    rcu_read_unlock();
> +    qemu_mutex_unlock_balloon_bitmap();
> +    return 0;
> +}
> diff --git a/exec.c b/exec.c
> index f398d21..7a448e5 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -43,6 +43,7 @@
>  #else /* !CONFIG_USER_ONLY */
>  #include "sysemu/xen-mapcache.h"
>  #include "trace.h"
> +#include "sysemu/balloon.h"
>  #endif
>  #include "exec/cpu-all.h"
>  #include "qemu/rcu_queue.h"
> @@ -1610,6 +1611,8 @@ static void ram_block_add(RAMBlock *new_block, Error 
> **errp)
>      if (new_ram_size > old_ram_size) {
>          migration_bitmap_extend(old_ram_size, new_ram_size);
>          dirty_memory_extend(old_ram_size, new_ram_size);
> +        qemu_balloon_bitmap_extend(old_ram_size << TARGET_PAGE_BITS,
> +                                   new_ram_size << TARGET_PAGE_BITS);
>      }
>      /* Keep the list sorted from biggest to smallest block.  Unlike QTAILQ,
>       * QLIST (which has an RCU-friendly variant) does not have insertion at
> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> index 22ad25c..9f3a4c8 100644
> --- a/hw/virtio/virtio-balloon.c
> +++ b/hw/virtio/virtio-balloon.c
> @@ -27,6 +27,7 @@
>  #include "qapi/visitor.h"
>  #include "qapi-event.h"
>  #include "trace.h"
> +#include "migration/migration.h"
>  
>  #if defined(__linux__)
>  #include <sys/mman.h>
> @@ -214,11 +215,13 @@ static void virtio_balloon_handle_output(VirtIODevice 
> *vdev, VirtQueue *vq)
>      VirtQueueElement *elem;
>      MemoryRegionSection section;
>  
> +    qemu_mutex_lock_balloon_bitmap();
>      for (;;) {
>          size_t offset = 0;
>          uint32_t pfn;
>          elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
>          if (!elem) {
> +            qemu_mutex_unlock_balloon_bitmap();
>              return;
>          }
>  
> @@ -242,6 +245,7 @@ static void virtio_balloon_handle_output(VirtIODevice 
> *vdev, VirtQueue *vq)
>              addr = section.offset_within_region;
>              balloon_page(memory_region_get_ram_ptr(section.mr) + addr,
>                           !!(vq == s->dvq));
> +            qemu_balloon_bitmap_update(addr, !!(vq == s->dvq));
>              memory_region_unref(section.mr);
>          }
>  
> @@ -249,6 +253,7 @@ static void virtio_balloon_handle_output(VirtIODevice 
> *vdev, VirtQueue *vq)
>          virtio_notify(vdev, vq);
>          g_free(elem);
>      }
> +    qemu_mutex_unlock_balloon_bitmap();
>  }
>  
>  static void virtio_balloon_receive_stats(VirtIODevice *vdev, VirtQueue *vq)
> @@ -303,6 +308,16 @@ out:
>      }
>  }
>  
> +static void virtio_balloon_migration_state_changed(Notifier *notifier,
> +                                                   void *data)
> +{
> +    MigrationState *mig = data;
> +
> +    if (migration_has_failed(mig)) {
> +        qemu_balloon_reset_bitmap_data();
> +    }
> +}
> +
>  static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t 
> *config_data)
>  {
>      VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
> @@ -382,6 +397,16 @@ static void virtio_balloon_stat(void *opaque, 
> BalloonInfo *info)
>                                               VIRTIO_BALLOON_PFN_SHIFT);
>  }
>  
> +static void virtio_balloon_in_progress(void *opaque, int *status)
> +{
> +    VirtIOBalloon *dev = VIRTIO_BALLOON(opaque);
> +    if (cpu_to_le32(dev->actual) != cpu_to_le32(dev->num_pages)) {
> +        *status = 1;
> +        return;
> +    }
> +    *status = 0;
> +}
> +
>  static void virtio_balloon_to_target(void *opaque, ram_addr_t target)
>  {
>      VirtIOBalloon *dev = VIRTIO_BALLOON(opaque);
> @@ -409,6 +434,7 @@ static void virtio_balloon_save_device(VirtIODevice 
> *vdev, QEMUFile *f)
>  
>      qemu_put_be32(f, s->num_pages);
>      qemu_put_be32(f, s->actual);
> +    qemu_balloon_bitmap_save(f);
>  }
>  
>  static int virtio_balloon_load(QEMUFile *f, void *opaque, int version_id)
> @@ -426,6 +452,7 @@ static int virtio_balloon_load_device(VirtIODevice *vdev, 
> QEMUFile *f,
>  
>      s->num_pages = qemu_get_be32(f);
>      s->actual = qemu_get_be32(f);
> +    qemu_balloon_bitmap_load(f);
>      return 0;
>  }
>  
> @@ -439,7 +466,9 @@ static void virtio_balloon_device_realize(DeviceState 
> *dev, Error **errp)
>                  sizeof(struct virtio_balloon_config));
>  
>      ret = qemu_add_balloon_handler(virtio_balloon_to_target,
> -                                   virtio_balloon_stat, s);
> +                                   virtio_balloon_stat,
> +                                   virtio_balloon_in_progress, s,
> +                                   VIRTIO_BALLOON_PFN_SHIFT);
>  
>      if (ret < 0) {
>          error_setg(errp, "Only one balloon device is supported");
> @@ -453,6 +482,9 @@ static void virtio_balloon_device_realize(DeviceState 
> *dev, Error **errp)
>  
>      reset_stats(s);
>  
> +    s->migration_state_notifier.notify = 
> virtio_balloon_migration_state_changed;
> +    add_migration_state_change_notifier(&s->migration_state_notifier);
> +
>      register_savevm(dev, "virtio-balloon", -1, 1,
>                      virtio_balloon_save, virtio_balloon_load, s);
>  }
> @@ -462,6 +494,7 @@ static void virtio_balloon_device_unrealize(DeviceState 
> *dev, Error **errp)
>      VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>      VirtIOBalloon *s = VIRTIO_BALLOON(dev);
>  
> +    remove_migration_state_change_notifier(&s->migration_state_notifier);
>      balloon_stats_destroy_timer(s);
>      qemu_remove_balloon_handler(s);
>      unregister_savevm(dev, "virtio-balloon", s);
> diff --git a/include/hw/virtio/virtio-balloon.h 
> b/include/hw/virtio/virtio-balloon.h
> index 35f62ac..1ded5a9 100644
> --- a/include/hw/virtio/virtio-balloon.h
> +++ b/include/hw/virtio/virtio-balloon.h
> @@ -43,6 +43,7 @@ typedef struct VirtIOBalloon {
>      int64_t stats_last_update;
>      int64_t stats_poll_interval;
>      uint32_t host_features;
> +    Notifier migration_state_notifier;
>  } VirtIOBalloon;
>  
>  #endif
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index ac2c12c..6c1d1af 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -267,6 +267,7 @@ void migrate_del_blocker(Error *reason);
>  
>  bool migrate_postcopy_ram(void);
>  bool migrate_zero_blocks(void);
> +bool migrate_skip_balloon(void);
>  
>  bool migrate_auto_converge(void);
>  
> diff --git a/include/sysemu/balloon.h b/include/sysemu/balloon.h
> index 3f976b4..5325c38 100644
> --- a/include/sysemu/balloon.h
> +++ b/include/sysemu/balloon.h
> @@ -15,14 +15,27 @@
>  #define _QEMU_BALLOON_H
>  
>  #include "qapi-types.h"
> +#include "migration/qemu-file.h"
>  
>  typedef void (QEMUBalloonEvent)(void *opaque, ram_addr_t target);
>  typedef void (QEMUBalloonStatus)(void *opaque, BalloonInfo *info);
> +typedef void (QEMUBalloonInProgress) (void *opaque, int *status);
>  
>  int qemu_add_balloon_handler(QEMUBalloonEvent *event_func,
> -                          QEMUBalloonStatus *stat_func, void *opaque);
> +                             QEMUBalloonStatus *stat_func,
> +                             QEMUBalloonInProgress *progress_func,
> +                             void *opaque, int pfn_shift);
>  void qemu_remove_balloon_handler(void *opaque);
>  bool qemu_balloon_is_inhibited(void);
>  void qemu_balloon_inhibit(bool state);
> +void qemu_mutex_lock_balloon_bitmap(void);
> +void qemu_mutex_unlock_balloon_bitmap(void);
> +void qemu_balloon_reset_bitmap_data(void);
> +void qemu_balloon_bitmap_setup(void);
> +int qemu_balloon_bitmap_extend(ram_addr_t old, ram_addr_t new);
> +int qemu_balloon_bitmap_update(ram_addr_t addr, int deflate);
> +int qemu_balloon_bitmap_test(RAMBlock *rb, ram_addr_t addr);
> +int qemu_balloon_bitmap_save(QEMUFile *f);
> +int qemu_balloon_bitmap_load(QEMUFile *f);
>  
>  #endif
> diff --git a/migration/migration.c b/migration/migration.c
> index 034a918..cb86307 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1200,6 +1200,15 @@ int migrate_use_xbzrle(void)
>      return s->enabled_capabilities[MIGRATION_CAPABILITY_XBZRLE];
>  }
>  
> +bool migrate_skip_balloon(void)
> +{
> +    MigrationState *s;
> +
> +    s = migrate_get_current();
> +
> +    return s->enabled_capabilities[MIGRATION_CAPABILITY_SKIP_BALLOON];
> +}
> +
>  int64_t migrate_xbzrle_cache_size(void)
>  {
>      MigrationState *s;
> diff --git a/migration/ram.c b/migration/ram.c
> index 704f6a9..161ab73 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -40,6 +40,7 @@
>  #include "trace.h"
>  #include "exec/ram_addr.h"
>  #include "qemu/rcu_queue.h"
> +#include "sysemu/balloon.h"
>  
>  #ifdef DEBUG_MIGRATION_RAM
>  #define DPRINTF(fmt, ...) \
> @@ -65,6 +66,7 @@ static uint64_t bitmap_sync_count;
>  #define RAM_SAVE_FLAG_XBZRLE   0x40
>  /* 0x80 is reserved in migration.h start with 0x100 next */
>  #define RAM_SAVE_FLAG_COMPRESS_PAGE    0x100
> +#define RAM_SAVE_FLAG_BALLOON  0x200
>  
>  static const uint8_t ZERO_TARGET_PAGE[TARGET_PAGE_SIZE];
>  
> @@ -702,13 +704,17 @@ static int save_zero_page(QEMUFile *f, RAMBlock *block, 
> ram_addr_t offset,
>  {
>      int pages = -1;
>  
> -    if (is_zero_range(p, TARGET_PAGE_SIZE)) {
> -        acct_info.dup_pages++;
> -        *bytes_transferred += save_page_header(f, block,
> +    if (qemu_balloon_bitmap_test(block, offset) != 1) {
> +        if (is_zero_range(p, TARGET_PAGE_SIZE)) {
> +            acct_info.dup_pages++;
> +            *bytes_transferred += save_page_header(f, block,
>                                                 offset | 
> RAM_SAVE_FLAG_COMPRESS);
> -        qemu_put_byte(f, 0);
> -        *bytes_transferred += 1;
> -        pages = 1;
> +            qemu_put_byte(f, 0);
> +            *bytes_transferred += 1;
> +            pages = 1;
> +        }
> +    } else {
> +        pages = 0;
>      }
>  
>      return pages;
> @@ -773,7 +779,7 @@ static int ram_save_page(QEMUFile *f, PageSearchStatus 
> *pss,
>               * page would be stale
>               */
>              xbzrle_cache_zero_page(current_addr);
> -        } else if (!ram_bulk_stage && migrate_use_xbzrle()) {
> +        } else if (pages != 0 && !ram_bulk_stage && migrate_use_xbzrle()) {

Is this test the right way around - don't you want to try xbzrle if you've NOT 
sent
a page?

>              pages = save_xbzrle_page(f, &p, current_addr, block,
>                                       offset, last_stage, bytes_transferred);
>              if (!last_stage) {
> @@ -1355,6 +1361,9 @@ static int ram_find_and_save_block(QEMUFile *f, bool 
> last_stage,
>          }
>  
>          if (found) {
> +            /* skip saving ram host page if the corresponding guest page
> +             * is ballooned out
> +             */
>              pages = ram_save_host_page(ms, f, &pss,
>                                         last_stage, bytes_transferred,
>                                         dirty_ram_abs);
> @@ -1959,6 +1968,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>  
>      rcu_read_unlock();
>  
> +    qemu_balloon_bitmap_setup();
>      ram_control_before_iterate(f, RAM_CONTROL_SETUP);
>      ram_control_after_iterate(f, RAM_CONTROL_SETUP);
>  
> @@ -1984,6 +1994,9 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
>  
>      ram_control_before_iterate(f, RAM_CONTROL_ROUND);
>  
> +    qemu_put_be64(f, RAM_SAVE_FLAG_BALLOON);
> +    qemu_balloon_bitmap_save(f);
> +
>      t0 = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
>      i = 0;
>      while ((ret = qemu_file_rate_limit(f)) == 0) {
> @@ -2493,6 +2506,10 @@ static int ram_load(QEMUFile *f, void *opaque, int 
> version_id)
>              }
>              break;
>  
> +        case RAM_SAVE_FLAG_BALLOON:
> +            qemu_balloon_bitmap_load(f);
> +            break;
> +
>          case RAM_SAVE_FLAG_COMPRESS:
>              ch = qemu_get_byte(f);
>              ram_handle_compressed(host, ch, TARGET_PAGE_SIZE);
> diff --git a/qapi-schema.json b/qapi-schema.json
> index 7f8d799..38163ca 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -544,11 +544,14 @@
>  #          been migrated, pulling the remaining pages along as needed. NOTE: 
> If
>  #          the migration fails during postcopy the VM will fail.  (since 2.6)
>  #
> +# @skip-balloon: Skip scanning ram pages released by virtio-balloon driver.
> +#          (since 2.7)
> +#
>  # Since: 1.2
>  ##
>  { 'enum': 'MigrationCapability',
>    'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks',
> -           'compress', 'events', 'postcopy-ram'] }
> +           'compress', 'events', 'postcopy-ram', 'skip-balloon'] }
>  
>  ##
>  # @MigrationCapabilityStatus
> -- 
> 1.8.3.1
> 
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK



reply via email to

[Prev in Thread] Current Thread [Next in Thread]