qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH V4] migration: add capability to bypass the shar


From: Lai Jiangshan
Subject: Re: [Qemu-devel] [PATCH V4] migration: add capability to bypass the shared memory
Date: Thu, 12 Apr 2018 10:34:27 +0800

On Tue, Apr 10, 2018 at 1:30 AM, Dr. David Alan Gilbert
<address@hidden> wrote:
> Hi,
>
> * Lai Jiangshan (address@hidden) wrote:
>> 1) What's this
>>
>> When the migration capability 'bypass-shared-memory'
>> is set, the shared memory will be bypassed when migration.
>>
>> It is the key feature to enable several excellent features for
>> the qemu, such as qemu-local-migration, qemu-live-update,
>> extremely-fast-save-restore, vm-template, vm-fast-live-clone,
>> yet-another-post-copy-migration, etc..
>>
>> The philosophy behind this key feature, including the resulting
>> advanced key features, is that a part of the memory management
>> is separated out from the qemu, and let the other toolkits
>> such as libvirt, kata-containers (https://github.com/kata-containers)
>> runv(https://github.com/hyperhq/runv/) or some multiple cooperative
>> qemu commands directly access to it, manage it, provide features on it.
>>
>> 2) Status in real world
>>
>> The hyperhq(http://hyper.sh  http://hypercontainer.io/)
>> introduced the feature vm-template(vm-fast-live-clone)
>> to the hyper container for several years, it works perfect.
>> (see https://github.com/hyperhq/runv/pull/297).
>>
>> The feature vm-template makes the containers(VMs) can
>> be started in 130ms and save 80M memory for every
>> container(VM). So that the hyper containers are fast
>> and high-density as normal containers.
>>
>> kata-containers project (https://github.com/kata-containers)
>> which was launched by hyper, intel and friends and which descended
>> from runv (and clear-container) should have this feature enabled.
>> Unfortunately, due to the code confliction between runv&cc,
>> this feature was temporary disabled and it is being brought
>> back by hyper and intel team.
>>
>> 3) How to use and bring up advanced features.
>>
>> In current qemu command line, shared memory has
>> to be configured via memory-object.
>>
>> a) feature: qemu-local-migration, qemu-live-update
>> Set the mem-path on the tmpfs and set share=on for it when
>> start the vm. example:
>> -object \
>> memory-backend-file,id=mem,size=128M,mem-path=/dev/shm/memory,share=on \
>> -numa node,nodeid=0,cpus=0-7,memdev=mem
>>
>> when you want to migrate the vm locally (after fixed a security bug
>> of the qemu-binary, or other reason), you can start a new qemu with
>> the same command line and -incoming, then you can migrate the
>> vm from the old qemu to the new qemu with the migration capability
>> 'bypass-shared-memory' set. The migration will migrate the device-state
>> *ONLY*, the memory is the origin memory backed by tmpfs file.
>>
>> b) feature: extremely-fast-save-restore
>> the same above, but the mem-path is on the persistent file system.
>>
>> c)  feature: vm-template, vm-fast-live-clone
>> the template vm is started as 1), and paused when the guest reaches
>> the template point(example: the guest app is ready), then the template
>> vm is saved. (the qemu process of the template can be killed now, because
>> we need only the memory and the device state files (in tmpfs)).
>>
>> Then we can launch one or multiple VMs base on the template vm states,
>> the new VMs are started without the “share=on”, all the new VMs share
>> the initial memory from the memory file, they save a lot of memory.
>> all the new VMs start from the template point, the guest app can go to
>> work quickly.
>
> How do you handle the storage in this case, or giving each VM it's own
> MAC address?

The user or the upper layer tools can copy/clone the storage
(on xfs,btrfs,ceph...). The user or the upper layer tools can handle the
interface MAC itself while this patch just focus on memory.

hyper/runv clone the vm before the interfaces are inserted.
vm-template are often used along with hotplugging.

>
>> The new VM booted from template vm can’t become template again,
>> if you need this unusual chained-template feature, you can write
>> a cloneable-tmpfs kernel module for it.
>>
>> The libvirt toolkit can’t manage vm-template currently, in the
>> hyperhq/runv, we use qemu wrapper script to do it. I hope someone add
>> “libvrit managed template” feature to libvirt.
>
>> d) feature: yet-another-post-copy-migration
>> It is a possible feature, no toolkit can do it well now.
>> Using nbd server/client on the memory file is reluctantly Ok but
>> inconvenient. A special feature for tmpfs might be needed to
>> fully complete this feature.
>> No one need yet another post copy migration method,
>> but it is possible when some crazy man need it.
>
> As the crazy person who did the existing postcopy; one is enough!
>

Very true. This part of comments just shows how much
potentials there are for such a simple migration capability.


> Some minor fix requests below, but this looks nice and simple.
>

Will do soon. Thank for your review.

> Shared memory is interesting because tehre are lots of different uses;
> e.g. your uses, but also vhost-user which is sharing for a completely
> different reason.
>
>> Cc: Samuel Ortiz <address@hidden>
>> Cc: Sebastien Boeuf <address@hidden>
>> Cc: James O. D. Hunt <address@hidden>
>> Cc: Xu Wang <address@hidden>
>> Cc: Peng Tao <address@hidden>
>> Cc: Xiao Guangrong <address@hidden>
>> Cc: Xiao Guangrong <address@hidden>
>> Signed-off-by: Lai Jiangshan <address@hidden>
>> ---
>>
>> Changes in V4:
>>  fixes checkpatch.pl errors
>>
>> Changes in V3:
>>  rebased on upstream master
>>  update the available version of the capability to
>>  v2.13
>>
>> Changes in V2:
>>  rebased on 2.11.1
>>
>>  migration/migration.c | 14 ++++++++++++++
>>  migration/migration.h |  1 +
>>  migration/ram.c       | 27 ++++++++++++++++++---------
>>  qapi/migration.json   |  6 +++++-
>>  4 files changed, 38 insertions(+), 10 deletions(-)
>>
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 52a5092add..6a63102d7f 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -1509,6 +1509,20 @@ bool migrate_release_ram(void)
>>      return s->enabled_capabilities[MIGRATION_CAPABILITY_RELEASE_RAM];
>>  }
>>
>> +bool migrate_bypass_shared_memory(void)
>> +{
>> +    MigrationState *s;
>> +
>> +    /* it is not workable with postcopy yet. */
>> +    if (migrate_postcopy_ram()) {
>> +        return false;
>> +    }
>
> Please change this to work in the same way as the check for
> postcopy+compress in migration.c migrate_caps_check.
>
>> +    s = migrate_get_current();
>> +
>> +    return 
>> s->enabled_capabilities[MIGRATION_CAPABILITY_BYPASS_SHARED_MEMORY];
>> +}
>> +
>>  bool migrate_postcopy_ram(void)
>>  {
>>      MigrationState *s;
>> diff --git a/migration/migration.h b/migration/migration.h
>> index 8d2f320c48..cfd2513ef0 100644
>> --- a/migration/migration.h
>> +++ b/migration/migration.h
>> @@ -206,6 +206,7 @@ MigrationState *migrate_get_current(void);
>>
>>  bool migrate_postcopy(void);
>>
>> +bool migrate_bypass_shared_memory(void);
>>  bool migrate_release_ram(void);
>>  bool migrate_postcopy_ram(void);
>>  bool migrate_zero_blocks(void);
>> diff --git a/migration/ram.c b/migration/ram.c
>> index 0e90efa092..bca170c386 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -780,6 +780,11 @@ unsigned long migration_bitmap_find_dirty(RAMState *rs, 
>> RAMBlock *rb,
>>      unsigned long *bitmap = rb->bmap;
>>      unsigned long next;
>>
>> +    /* when this ramblock is requested bypassing */
>> +    if (!bitmap) {
>> +        return size;
>> +    }
>> +
>>      if (rs->ram_bulk_stage && start > 0) {
>>          next = start + 1;
>>      } else {
>> @@ -850,7 +855,9 @@ static void migration_bitmap_sync(RAMState *rs)
>>      qemu_mutex_lock(&rs->bitmap_mutex);
>>      rcu_read_lock();
>>      RAMBLOCK_FOREACH(block) {
>> -        migration_bitmap_sync_range(rs, block, 0, block->used_length);
>> +        if (!migrate_bypass_shared_memory() || !qemu_ram_is_shared(block)) {
>> +            migration_bitmap_sync_range(rs, block, 0, block->used_length);
>> +        }
>>      }
>>      rcu_read_unlock();
>>      qemu_mutex_unlock(&rs->bitmap_mutex);
>> @@ -2132,18 +2139,12 @@ static int ram_state_init(RAMState **rsp)
>>      qemu_mutex_init(&(*rsp)->src_page_req_mutex);
>>      QSIMPLEQ_INIT(&(*rsp)->src_page_requests);
>>
>> -    /*
>> -     * Count the total number of pages used by ram blocks not including any
>> -     * gaps due to alignment or unplugs.
>> -     */
>> -    (*rsp)->migration_dirty_pages = ram_bytes_total() >> TARGET_PAGE_BITS;
>> -
>>      ram_state_reset(*rsp);
>>
>>      return 0;
>>  }
>>
>> -static void ram_list_init_bitmaps(void)
>> +static void ram_list_init_bitmaps(RAMState *rs)
>>  {
>>      RAMBlock *block;
>>      unsigned long pages;
>> @@ -2151,9 +2152,17 @@ static void ram_list_init_bitmaps(void)
>>      /* Skip setting bitmap if there is no RAM */
>>      if (ram_bytes_total()) {
>
> I think you need to add here a :
>    rs->migration_dirty_pages = 0;
>
> I don't see anywhere else that initialises it, and there is the case of
> a migration that fails, followed by a 2nd attempt.
>
>>          QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
>> +            if (migrate_bypass_shared_memory() && 
>> qemu_ram_is_shared(block)) {
>> +                continue;
>> +            }
>>              pages = block->max_length >> TARGET_PAGE_BITS;
>>              block->bmap = bitmap_new(pages);
>>              bitmap_set(block->bmap, 0, pages);
>> +            /*
>> +             * Count the total number of pages used by ram blocks not
>> +             * including any gaps due to alignment or unplugs.
>> +             */
>> +            rs->migration_dirty_pages += pages;
>>              if (migrate_postcopy_ram()) {
>>                  block->unsentmap = bitmap_new(pages);
>>                  bitmap_set(block->unsentmap, 0, pages);
>> @@ -2169,7 +2178,7 @@ static void ram_init_bitmaps(RAMState *rs)
>>      qemu_mutex_lock_ramlist();
>>      rcu_read_lock();
>>
>> -    ram_list_init_bitmaps();
>> +    ram_list_init_bitmaps(rs);
>>      memory_global_dirty_log_start();
>>      migration_bitmap_sync(rs);
>>
>> diff --git a/qapi/migration.json b/qapi/migration.json
>> index 9d0bf82cf4..45326480bd 100644
>> --- a/qapi/migration.json
>> +++ b/qapi/migration.json
>> @@ -357,13 +357,17 @@
>>  # @dirty-bitmaps: If enabled, QEMU will migrate named dirty bitmaps.
>>  #                 (since 2.12)
>>  #
>> +# @bypass-shared-memory: the shared memory region will be bypassed on 
>> migration.
>> +#          This feature allows the memory region to be reused by new qemu(s)
>> +#          or be migrated separately. (since 2.13)
>> +#
>>  # Since: 1.2
>>  ##
>>  { 'enum': 'MigrationCapability',
>>    'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks',
>>             'compress', 'events', 'postcopy-ram', 'x-colo', 'release-ram',
>>             'block', 'return-path', 'pause-before-switchover', 'x-multifd',
>> -           'dirty-bitmaps' ] }
>> +           'dirty-bitmaps', 'bypass-shared-memory' ] }
>>
>>  ##
>>  # @MigrationCapabilityStatus:
>> --
>> 2.14.3 (Apple Git-98)
>>
> --
> Dr. David Alan Gilbert / address@hidden / Manchester, UK



reply via email to

[Prev in Thread] Current Thread [Next in Thread]