qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] Re: [RFC] KVM Fault Tolerance: Kemari for KVM


From: Yoshiaki Tamura
Subject: [Qemu-devel] Re: [RFC] KVM Fault Tolerance: Kemari for KVM
Date: Tue, 17 Nov 2009 23:06:01 +0900

2009/11/17 Avi Kivity <address@hidden>:
> On 11/17/2009 01:04 PM, Yoshiaki Tamura wrote:
>>>
>>> What I mean is:
>>>
>>> - choose synchronization point A
>>> - start copying memory for synchronization point A
>>>  - output is delayed
>>> - choose synchronization point B
>>> - copy memory for A and B
>>>   if guest touches memory not yet copied for A, COW it
>>> - once A copying is complete, release A output
>>> - continue copying memory for B
>>> - choose synchronization point B
>>>
>>> by keeping two synchronization points active, you don't have any pauses.
>>>  The cost is maintaining copy-on-write so we can copy dirty pages for A
>>> while keeping execution.
>>
>>
>> The overall idea seems good, but if I'm understanding correctly, we need a
>> buffer for copying memory locally, and when it gets full, or when we COW the
>> memory for B, we still have to pause the guest to prevent from overwriting.
>> Correct?
>
> Yes.  During COW the guest would not be able to access the page, but if
> other vcpus access other pages, they can still continue.  So generally
> synchronization would be pauseless.

Understood.

>> To make things simple, we would like to start with the synchronous
>> transmission first, and tackle asynchronous transmission later.
>
> Of course.  I'm just worried that realistic workloads will drive the latency
> beyond acceptable limits.

We're paying attention to this issue too, and would like do more advanced
stuff once there is a toy that runs on KVM.

>>>>> How many pages do you copy per synchronization point for reasonably
>>>>> difficult workloads?
>>>>
>>>> That is very workload-dependent, but if you take a look at the examples
>>>> below you can get a feeling of how Kemari behaves.
>>>>
>>>> IOzone            Kemari sync interval[ms]  dirtied pages
>>>> ---------------------------------------------------------
>>>> buffered + fsync                       400           3000
>>>> O_SYNC                                  10             80
>>>>
>>>> In summary, if the guest executes few I/O operations, the interval
>>>> between Kemari synchronizations points will increase and the number of
>>>> dirtied pages will grow accordingly.
>>>
>>> In the example above, the externally observed latency grows to 400 ms,
>>> yes?
>>
>> Not exactly.  The sync interval refers to the interval of synchronization
>> points captured when the workload is running.  In the example above, when
>> the observed sync interval is 400ms, it takes about 150ms to sync VMs with
>> 3000 dirtied pages.  Kemari resumes I/O operations immediately once the
>> synchronization is finished, and thus, the externally observed latency is
>> 150ms in this case.
>
> Not sure I understand.
>
> If a packet is output from a guest immediately after a synchronization
> point, doesn't it need to be delayed until the next synchronization point?

Kemari kicks the synchronization on event driven manner.
So the packet itself is captured as synchronization point,
and will start the synchronization immediately.

>  So it's not just the guest pause time that matters, but also the interval
> between sync points?

It does matter, and in case of Kemari, the interval between sync points varies
depending on what kind of workload is running.

 In the IOzone example above, two types of workloads are demonstrated.
Buffered writes w/ fsync creates less sync point, which leads to longer sync
interval and more dirtied pages.  On the other hand, O_SYNC writes creates
more sync point, which leads to shorter sync interval and less dirtied pages.

The benefit of event driven approach is that you don't have to start
synchronization until there is a specific event to be captured no matter how
many pages the guest may have dirtied.

Thanks,

Yoshi




reply via email to

[Prev in Thread] Current Thread [Next in Thread]