[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: The issues about architecture of the COLO checkpoint
From: |
Zhanghailiang |
Subject: |
RE: The issues about architecture of the COLO checkpoint |
Date: |
Wed, 12 Feb 2020 03:18:03 +0000 |
Hi,
Thank you Dave,
I'll reply here directly.
-----Original Message-----
From: Dr. David Alan Gilbert [mailto:address@hidden]
Sent: Wednesday, February 12, 2020 1:48 AM
To: Daniel Cho <address@hidden>; address@hidden; Zhanghailiang <address@hidden>
Cc: address@hidden
Subject: Re: The issues about architecture of the COLO checkpoint
cc'ing in COLO people:
* Daniel Cho (address@hidden) wrote:
> Hi everyone,
> We have some issues about setting COLO feature. Hope somebody
> could give us some advice.
>
> Issue 1:
> We dynamic to set COLO feature for PVM(2 core, 16G memory), but
> the Primary VM will pause a long time(based on memory size) for
> waiting SVM start. Does it have any idea to reduce the pause time?
>
Yes, we do have some ideas to optimize this downtime.
The main problem for current version is, for each checkpoint, we have to send
the whole PVM's pages
To SVM, and then copy the whole VM's state into SVM from ram cache, in this
process, we need both of them be paused.
Just as you said, the downtime is based on memory size.
So firstly, we need to reduce the sending data while do checkpoint, actually,
we can migrate parts of PVM's dirty pages in background
While both of VMs are running. And then we load these pages into ram cache
(backup memory) in SVM temporarily. While do checkpoint,
We just send the last dirty pages of PVM to slave side and then copy the ram
cache into SVM. Further on, we don't have
To send the whole PVM's dirty pages, we can only send the pages that dirtied by
PVM or SVM during two checkpoints. (Because
If one page is not dirtied by both PVM and SVM, the data of this pages will
keep same in SVM, PVM, backup memory). This method can reduce
the time that consumed in sending data.
For the second problem, we can reduce the memory copy by two methods, first
one, we don't have to copy the whole pages in ram cache,
We can only copy the pages that dirtied by PVM and SVM in last checkpoint.
Second, we can use userfault missing function to reduce the
Time consumed in memory copy. (For the second time, in theory, we can reduce
time consumed in memory into ms level).
You can find the first optimization in attachment, it is based on an old qemu
version (qemu-2.6), it should not be difficult to rebase it
Into master or your version. And please feel free to send the new version if
you want into community ;)
>
> Issue 2:
> In
> https://github.com/qemu/qemu/blob/master/migration/colo.c#L503,
> could we move start_vm() before Line 488? Because at first checkpoint
> PVM will wait for SVM's reply, it cause PVM stop for a while.
>
No, that makes no sense, because if PVM runs firstly, it still need to wait for
The network packets from SVM to compare before send it to client side.
Thanks,
Hailiang
> We set the COLO feature on running VM, so we hope the running VM
> could continuous service for users.
> Do you have any suggestions for those issues?
>
> Best regards,
> Daniel Cho
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK
0001-COLO-Migrate-dirty-pages-during-the-gap-of-checkpoin.patch
Description: 0001-COLO-Migrate-dirty-pages-during-the-gap-of-checkpoin.patch
0001-COLO-Optimize-memory-back-up-process.patch
Description: 0001-COLO-Optimize-memory-back-up-process.patch
- The issues about architecture of the COLO checkpoint, Daniel Cho, 2020/02/11
- Re: The issues about architecture of the COLO checkpoint, Dr. David Alan Gilbert, 2020/02/11
- RE: The issues about architecture of the COLO checkpoint,
Zhanghailiang <=
- RE: The issues about architecture of the COLO checkpoint, Zhang, Chen, 2020/02/12
- Re: The issues about architecture of the COLO checkpoint, Daniel Cho, 2020/02/12
- RE: The issues about architecture of the COLO checkpoint, Zhanghailiang, 2020/02/12
- RE: The issues about architecture of the COLO checkpoint, Zhang, Chen, 2020/02/12
- RE: The issues about architecture of the COLO checkpoint, Zhang, Chen, 2020/02/12
- Re: The issues about architecture of the COLO checkpoint, Daniel Cho, 2020/02/12
- Re: The issues about architecture of the COLO checkpoint, Dr. David Alan Gilbert, 2020/02/13
- Re: The issues about architecture of the COLO checkpoint, Daniel Cho, 2020/02/14
- RE: The issues about architecture of the COLO checkpoint, Zhanghailiang, 2020/02/16
- Re: The issues about architecture of the COLO checkpoint, Zhang, Chen, 2020/02/17