qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH COLO-Frame (Base) v21 00/17] COarse-grain LOck-s


From: Li Zhijian
Subject: Re: [Qemu-devel] [PATCH COLO-Frame (Base) v21 00/17] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT)
Date: Wed, 26 Oct 2016 18:17:21 +0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.3.0



On 10/26/2016 05:53 PM, Li Zhijian wrote:


On 10/26/2016 04:26 PM, Amit Shah wrote:
On (Wed) 26 Oct 2016 [14:43:30], Hailiang Zhang wrote:
Hi Amit,

On 2016/10/26 14:09, Amit Shah wrote:
Hello,

On (Tue) 18 Oct 2016 [20:09:56], zhanghailiang wrote:
This is the 21th version of COLO frame series.

Rebase to the latest master.

I've reviewed the patchset, have some minor comments, but overall it
looks good.  The changes are contained, and common code / existing
code paths are not affected much.  We can still target to merge this
for 2.8.


I really appreciate your help ;), I will fix all the issues later
and send v22. Hope we can still catch the deadline of V2.8.

Do you have any tests on how much the VM slows down / downtime
incurred during checkpoints?


Yes, we tested that long time ago, it all depends.
The downtime is determined by the time of transferring the dirty pages
and the time of flushing ram from ram buffer.
But we really have methods to reduce the downtime.

One method is to reduce the amount of data (dirty pages mainly) while do 
checkpoint
by transferring dirty pages asynchronously while PVM and SVM are running (no in
the time of doing checkpoint). Besides we can re-use the capability of 
migration, such
as compressing, etc.
Another method is to reduce the time of flushing ram by using userfaultfd API
to convert copying ram into marking bitmap. We can also flushing the ram buffer
by multiple threads which advised by Dave ...

Yes, I understand that as with any migration numbers, this too depends
on what the guest is doing.  However, can you just pick some standard
workload - kernel compile or something like that - and post a few
observations?

Sure, we have collected some performance data from previous COLO few month ago.

-------------------------+----------+-----------+--------------+-------------------+---------------+
benchmark                | guest    | case      | native       |                
   | performance
-------------------------+----------+-----------+--------------+-------------------+---------------+
webbench (bytes/sec)     | 2vCPU 2G | 50 client |   105358952  | 
99396093.3333333  |   94.34%
-------------------------+----------+-----------+--------------+-------------------+---------------+
-------------------------+----------+-----------+--------------+-------------------+---------------+
-------------------------+----------+-----------+--------------+-------------------+---------------+
-------------------------+----------+-----------+--------------+-------------------+---------------+
-------------------------+----------+-----------+--------------+-------------------+---------------+

Sorry for the noise.
Please refer to another mail.

Thanks





Also, can you tell how did you arrive at the default checkpoint
interval?


Er, for this value, we referred to Remus in XEN platform. ;)
But after we implement COLO with colo proxy, this interval value will be changed
to a bigger one (10s). And we will make it configuration too. Besides, we will
add another configurable value to control the min interval of checkpointing.

OK - any typical value that is a good mix between COLO keeping the
network too busy / guest paused vs guest making progress?  Again this
is something that's workload-dependent, but I guess you have typical
numbers from a network-bound workload?

Thanks,

        Amit


.



--
Best regards.
Li Zhijian (8555)





reply via email to

[Prev in Thread] Current Thread [Next in Thread]