[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v3] XBZRLE delta for live migration of large mem
From: |
Stefan Hajnoczi |
Subject: |
Re: [Qemu-devel] [PATCH v3] XBZRLE delta for live migration of large memory apps |
Date: |
Tue, 2 Aug 2011 19:05:54 +0100 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
On Tue, Aug 02, 2011 at 03:45:56PM +0200, Shribman, Aidan wrote:
> Subject: [PATCH v3] XBZRLE delta for live migration of large memory apps
> From: Aidan Shribman <address@hidden>
>
> By using XBZRLE (Xor Binary Zero Run-Length-Encoding) we can reduce VM
> downtime
> and total live-migration time for VMs running memory write intensive workloads
> typical of large enterprise applications such as SAP ERP Systems, and
> generally
> speaking for representative of any application with a sparse memory update
> pattern.
>
> On the sender side XBZRLE is used as a compact delta encoding of page updates,
> retrieving the old page content from an LRU cache (default size of 64 MB). The
> receiving side uses the existing page content and XBZRLE to decode the new
> page
> content.
>
> Work was originally based on research results published VEE 2011: Evaluation
> of
> Delta Compression Techniques for Efficient Live Migration of Large Virtual
> Machines by Benoit, Svard, Tordsson and Elmroth. Additionally the delta
> encoder
> XBRLE was improved further using XBZRLE instead.
>
> XBZRLE has a sustained bandwidth of 1.5-2.2 GB/s for typical workloads making
> it
> ideal for in-line, real-time encoding such as is needed for live-migration.
What is the CPU cost of xbzrle live migration on the source host? I'm
thinking about a graph showing CPU utilization (e.g. from mpstat(1))
that has two datasets: migration without xbzrle and migration with
xbzrle.
> @@ -128,28 +288,35 @@ static int ram_save_block(QEMUFile *f)
> current_addr + TARGET_PAGE_SIZE,
> MIGRATION_DIRTY_FLAG);
>
> - p = block->host + offset;
> + if (arch_mig_state.use_xbrle) {
> + p = qemu_mallocz(TARGET_PAGE_SIZE);
qemu_malloc()
> +static uint8_t count_hash_bits(uint64_t v)
> +{
> + uint8_t bits = 0;
> +
> + while (!(v & 1)) {
> + v = v >> 1;
> + bits++;
> + }
> + return bits;
> +}
See ffs(3). ffsll() does what you need.
> +static uint8_t xor_buf[TARGET_PAGE_SIZE];
> +static uint8_t xbzrle_buf[TARGET_PAGE_SIZE * 2];
Do these need to be static globals? It should be fine to define them as
local variables inside the functions that need them, there is enough
stack space.
> +
> +int xbzrle_encode(uint8_t *xbzrle, const uint8_t *old, const uint8_t *curr,
> + const size_t max_compressed_len)
> +{
> + int compressed_len;
> +
> + xor_encode_word(xor_buf, old, curr);
> + compressed_len = rle_encode((uint64_t *)xor_buf,
> + sizeof(xor_buf)/sizeof(uint64_t), xbzrle_buf,
> + sizeof(xbzrle_buf));
> + if (compressed_len > max_compressed_len) {
> + return -1;
> + }
> + memcpy(xbzrle, xbzrle_buf, compressed_len);
Why the intermediate xbrzle_buf buffer and why the memcpy()?
return rle_encode((uint64_t *)xor_buf, sizeof(xor_buf) / sizeof(uint64_t),
xbzrle, max_compressed_len);
Stefan