qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH RDMA support v1: 10/13] introduce new comman


From: Paolo Bonzini
Subject: Re: [Qemu-devel] [RFC PATCH RDMA support v1: 10/13] introduce new command migrate_check_for_zero
Date: Thu, 11 Apr 2013 17:08:10 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130311 Thunderbird/17.0.4

Il 11/04/2013 16:57, Michael R. Hines ha scritto:
> We have hardware already with front side bus speeds of 13 GB/s.
> 
> We also already have 5 GB/s RDMA hardware, and we will likely
> have even faster RDMA hardware in the future.
> 
> This analysis is not factoring into account the cycles it takes to
> map the pages before they are checked for duplicate bytes,

Do you mean the TLB misses?

> regardless whether or not very little of the page is actually
> cached on the processor.
> 
> This analysis is also not taking into account the possibility that the
> VM may be CPU-bound at the same time that QEMU is competing
> to execute is_dup_page().

is_dup_page() is memory-bound, not CPU-bound.  Note that is_dup_page
only needs 1% of the bandwidth it scans (32 bytes for a cache line out
of 4096 bytes/page).  Scanning 30 GB/s only requires reading 250 MB/s
from memory to the FSB.

> Thus, as you mentioned, a worst-case 5 GB/s memory bandwidth
> for is_dup_page() could be very easily reached given the right
> conditions - and we do have many workloads both HPC and Multi-tier
> which can easily cause QEMU's zero scanning performance to suffer.

These are the real world scenarios that I was talking about.  Do you
have profiles of these, with the latest QEMU code, that show
is_dup_page() to be expensive?

We could try prefetching the first cache line *of the next page* before
running is_dup_page.  There's a lot of things to test before giving up
and inventing a new API.

Paolo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]