qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Migration dirty bitmap: should only mark pages as dirty


From: Dr. David Alan Gilbert
Subject: Re: [Qemu-devel] Migration dirty bitmap: should only mark pages as dirty after they have been sent
Date: Fri, 14 Oct 2016 12:15:48 +0100
User-agent: Mutt/1.7.1 (2016-10-04)

* Chunguang Li (address@hidden) wrote:
> 
> 
> 
> > -----原始邮件-----
> > 发件人: "Amit Shah" <address@hidden>
> > 发送时间: 2016年9月30日 星期五
> > 收件人: "Chunguang Li" <address@hidden>
> > 抄送: "Dr. David Alan Gilbert" <address@hidden>, address@hidden, 
> > address@hidden, address@hidden, address@hidden
> > 主题: Re: Re: [Qemu-devel] Migration dirty bitmap: should only mark pages as 
> > dirty after they have been sent
> > 
> > On (Mon) 26 Sep 2016 [22:55:01], Chunguang Li wrote:
> > > 
> > > 
> > > 
> > > > -----原始邮件-----
> > > > 发件人: "Dr. David Alan Gilbert" <address@hidden>
> > > > 发送时间: 2016年9月26日 星期一
> > > > 收件人: "Chunguang Li" <address@hidden>
> > > > 抄送: address@hidden, address@hidden, address@hidden, address@hidden, 
> > > > address@hidden
> > > > 主题: Re: [Qemu-devel] Migration dirty bitmap: should only mark pages as 
> > > > dirty after they have been sent
> > > > 
> > > > * Chunguang Li (address@hidden) wrote:
> > > > > Hi all!
> > > > > I have some confusion about the dirty bitmap during migration. I have 
> > > > > digged into the code. I figure out that every now and then during 
> > > > > migration, the dirty bitmap will be grabbed from the kernel space 
> > > > > through ioctl(KVM_GET_DIRTY_LOG), and then be used to update qemu's 
> > > > > dirty bitmap. However I think this mechanism leads to resendness of 
> > > > > some NON-dirty pages.
> > > > > 
> > > > > Take the first iteration of precopy for instance, during which all 
> > > > > the pages will be sent. Before that during the migration setup, the 
> > > > > ioctl(KVM_GET_DIRTY_LOG) is called once, so the kernel begins to 
> > > > > produce the dirty bitmap from this moment. When the pages "that 
> > > > > haven't been sent" are written, the kernel space marks them as dirty. 
> > > > > However I don't think this is correct, because these pages will be 
> > > > > sent during this and the next iterations with the same content (if 
> > > > > they are not written again after they are sent). It only makes sense 
> > > > > to mark the pages which have already been sent during one iteration 
> > > > > as dirty when they are written.
> > > > > 
> > > > > 
> > > > > Am I right about this consideration? If I am right, is there some 
> > > > > advice to improve this?
> > > > 
> > > > I think you're right that this can happen; to clarify I think the
> > > > case you're talking about is:
> > > > 
> > > >   Iteration 1
> > > >     sync bitmap
> > > >     start sending pages
> > > >     page 'n' is modified - but hasn't been sent yet
> > > >     page 'n' gets sent
> > > >   Iteration 2
> > > >     sync bitmap
> > > >        'page n is shown as modified'
> > > >     send page 'n' again
> > > >
> > > 
> > > Yes,this is right the case I am talking about.
> > >  
> > > > So you're right that is wasteful; I guess it's more wasteful
> > > > on big VMs with slow networks where the length of each iteration
> > > > is large.
> > > 
> > > I think this is "very" wasteful. Assume the workload writes the pages 
> > > dirty randomly within the guest address space, and the transfer speed is 
> > > constant. Intuitively, I think nearly half of the dirty pages produced in 
> > > Iteration 1 is not really dirty. This means the time of Iteration 2 is 
> > > double of that to send only really dirty pages.
> > 
> > It makes sense, can you get some perf numbers to show what kinds of
> > workloads get impacted the most?  That would also help us to figure
> > out what kinds of speed improvements we can expect.
> > 
> > 
> >             Amit
> 
> I have picked up 6 workloads and got the following statistics numbers 
> of every iteration (except the last stop-copy one) during precopy.
> These numbers are obtained with the basic precopy migration, without 
> the capabilities like xbzrle or compression, etc. The network for the 
> migration is exclusive, with a separate network for the workloads. 
> They are both gigabit ethernet. I use qemu-2.5.1.
> 
> Three (booting, idle, web server) of them converged to the stop-copy phase, 
> with the given bandwidth and default downtime (300ms), while the other
> three (kernel compilation, zeusmp, memcached) did not.
> 
> One page is "not-really-dirty", if it is written first and is sent later
> (and not written again after that) during one iteration. I guess this 
> would not happen so often during the other iterations as during the 1st 
> iteration. Because all the pages of the VM are sent to the dest node during 
> the 1st iteration, while during the others, only part of the pages are sent. 
> So I think the "not-really-dirty" pages should be produced mainly during 
> the 1st iteration , and maybe very little during the other iterations.
> 
> If we could avoid resending the "not-really-dirty" pages, intuitively, I
> think the time spent on Iteration 2 would be halved. This is a chain reaction,
> because the dirty pages produced during Iteration 2 is halved, which incurs
> that the time spent on Iteration 3 is halved, then Iteration 4, 5...

Yes; these numbers don't show how many of them are false dirty though.

One problem is thinking about pages that have been redirtied, if the page is 
dirtied
after the sync but before the network write then it's the false-dirty that
you're describing.

However, if the page is being written a few times, and so it would have been 
written
after the network write then it isn't a false-dirty. 

You might be able to figure that out with some kernel tracing of when the 
dirtying
happens, but it might be easier to write the fix!

Dave

> So I think "booting" and  "kernel compilation" should benefit a lot from this
> improvement. The reason of "kernel compilation" would benefit is that some 
> iterations take around 600ms, and if they are halved into 300ms, then the 
> precopy
> may have the chance to step into stop and copy phase.
> 
> On the other hand, "idle" and "web server" would not benefit a lot, because
> most of the time are spent on the 1st iteration and little on the others.
> 
> As to the "zeusmp" and "memcached", although the time spent on the other 
> iterations
> but the 1st one may be halved, they still could not converge to stop and copy 
> with the 300ms downtime.
> 
> --------------------1 vcpu, 1 GB ram, default bandwidth 
> (32MB/s):------------------
> 
> 1. booting : begin to migrate when the VM is booting
> 
> Iteration   1, duration:   6997 ms , transferred pages:   266450 (n:    
> 57269, d:   209181 ) , new dirty pages:    56414 , remaining dirty pages:    
> 56414
> Iteration   2, duration:   6497 ms , transferred pages:    54008 (n:    
> 52701, d:     1307 ) , new dirty pages:    48053 , remaining dirty pages:    
> 50459
> Iteration   3, duration:   5800 ms , transferred pages:    48232 (n:    
> 47444, d:      788 ) , new dirty pages:     9129 , remaining dirty pages:    
> 11356
> Iteration   4, duration:   1100 ms , transferred pages:     9091 (n:     
> 8998, d:       93 ) , new dirty pages:      165 , remaining dirty pages:     
> 2430
> Iteration   5, duration:      1 ms , transferred pages:        0 (n:        
> 0, d:        0 ) , new dirty pages:        0 , remaining dirty pages:     2430
> (note: When the workload does converge, the output of the last iteration is 
> "fake". It just indicates that the precopy steps into stop-copy phase now.
>        "n" means "normal pages" and "d" means "duplicate (zero) pages".)
> 
> 2. idle
> 
> Iteration   1, duration:  14496 ms , transferred pages:   266450 (n:   
> 118980, d:   147470 ) , new dirty pages:    17398 , remaining dirty pages:    
> 17398
> Iteration   2, duration:   1896 ms , transferred pages:    14953 (n:    
> 14854, d:       99 ) , new dirty pages:     1849 , remaining dirty pages:     
> 4294
> Iteration   3, duration:    300 ms , transferred pages:     2454 (n:     
> 2454, d:        0 ) , new dirty pages:        9 , remaining dirty pages:     
> 1849
> Iteration   4, duration:      1 ms , transferred pages:        0 (n:        
> 0, d:        0 ) , new dirty pages:        0 , remaining dirty pages:     1849
> 
> 3. kernel compilation (can not converge)
> 
> Iteration   1, duration:  20700 ms , transferred pages:   266450 (n:   
> 169778, d:    96672 ) , new dirty pages:    40067 , remaining dirty pages:    
> 40067
> Iteration   2, duration:   4696 ms , transferred pages:    38401 (n:    
> 37787, d:      614 ) , new dirty pages:     8852 , remaining dirty pages:    
> 10518
> Iteration   3, duration:   1000 ms , transferred pages:     8642 (n:     
> 8180, d:      462 ) , new dirty pages:     6331 , remaining dirty pages:     
> 8207
> Iteration   4, duration:    700 ms , transferred pages:     6110 (n:     
> 5726, d:      384 ) , new dirty pages:     5242 , remaining dirty pages:     
> 7339
> Iteration   5, duration:    600 ms , transferred pages:     5007 (n:     
> 4908, d:       99 ) , new dirty pages:     4868 , remaining dirty pages:     
> 7200
> Iteration   6, duration:    600 ms , transferred pages:     5226 (n:     
> 4908, d:      318 ) , new dirty pages:     6142 , remaining dirty pages:     
> 8116
> Iteration   7, duration:    700 ms , transferred pages:     5985 (n:     
> 5726, d:      259 ) , new dirty pages:     5902 , remaining dirty pages:     
> 8033
> Iteration   8, duration:    701 ms , transferred pages:     5893 (n:     
> 5726, d:      167 ) , new dirty pages:     7502 , remaining dirty pages:     
> 9642
> Iteration   9, duration:    900 ms , transferred pages:     7623 (n:     
> 7362, d:      261 ) , new dirty pages:     6408 , remaining dirty pages:     
> 8427
> Iteration  10, duration:    700 ms , transferred pages:     6008 (n:     
> 5726, d:      282 ) , new dirty pages:     8312 , remaining dirty pages:    
> 10731
> Iteration  11, duration:   1000 ms , transferred pages:     8353 (n:     
> 8180, d:      173 ) , new dirty pages:     6874 , remaining dirty pages:     
> 9252
> Iteration  12, duration:    899 ms , transferred pages:     7477 (n:     
> 7362, d:      115 ) , new dirty pages:     5573 , remaining dirty pages:     
> 7348
> Iteration  13, duration:    601 ms , transferred pages:     5099 (n:     
> 4908, d:      191 ) , new dirty pages:     7671 , remaining dirty pages:     
> 9920
> Iteration  14, duration:    900 ms , transferred pages:     7586 (n:     
> 7362, d:      224 ) , new dirty pages:     7359 , remaining dirty pages:     
> 9693
> Iteration  15, duration:    900 ms , transferred pages:     7682 (n:     
> 7362, d:      320 ) , new dirty pages:     7371 , remaining dirty pages:     
> 9382
> 
> 4. cpu2006.zeusmp (can not converge)
> 
> Iteration   1, duration:  21603 ms , transferred pages:   266450 (n:   
> 176660, d:    89790 ) , new dirty pages:   145625 , remaining dirty pages:   
> 145625
> Iteration   2, duration:   8696 ms , transferred pages:   144389 (n:    
> 70862, d:    73527 ) , new dirty pages:   125124 , remaining dirty pages:   
> 126360
> Iteration   3, duration:   6301 ms , transferred pages:   124057 (n:    
> 51379, d:    72678 ) , new dirty pages:   122528 , remaining dirty pages:   
> 124831
> Iteration   4, duration:   6400 ms , transferred pages:   124330 (n:    
> 52196, d:    72134 ) , new dirty pages:   124267 , remaining dirty pages:   
> 124768
> Iteration   5, duration:   6703 ms , transferred pages:   124034 (n:    
> 54656, d:    69378 ) , new dirty pages:   124151 , remaining dirty pages:   
> 124885
> Iteration   6, duration:   6703 ms , transferred pages:   124357 (n:    
> 54658, d:    69699 ) , new dirty pages:   124106 , remaining dirty pages:   
> 124634
> Iteration   7, duration:   6602 ms , transferred pages:   124568 (n:    
> 53838, d:    70730 ) , new dirty pages:   133828 , remaining dirty pages:   
> 133894
> Iteration   8, duration:   7600 ms , transferred pages:   133030 (n:    
> 62021, d:    71009 ) , new dirty pages:   126612 , remaining dirty pages:   
> 127476
> Iteration   9, duration:   7299 ms , transferred pages:   126511 (n:    
> 59569, d:    66942 ) , new dirty pages:   122727 , remaining dirty pages:   
> 123692
> Iteration  10, duration:   6609 ms , transferred pages:   123692 (n:    
> 54539, d:    69153 ) , new dirty pages:   122727 , remaining dirty pages:   
> 122727
> Iteration  11, duration:   6995 ms , transferred pages:   120347 (n:    
> 56423, d:    63924 ) , new dirty pages:   121430 , remaining dirty pages:   
> 123810
> Iteration  12, duration:   6703 ms , transferred pages:   123040 (n:    
> 54657, d:    68383 ) , new dirty pages:   122043 , remaining dirty pages:   
> 122813
> Iteration  13, duration:   7006 ms , transferred pages:   122353 (n:    
> 57121, d:    65232 ) , new dirty pages:   133869 , remaining dirty pages:   
> 134329
> Iteration  14, duration:   8209 ms , transferred pages:   132325 (n:    
> 66932, d:    65393 ) , new dirty pages:   126914 , remaining dirty pages:   
> 128918
> Iteration  15, duration:   7802 ms , transferred pages:   126931 (n:    
> 63671, d:    63260 ) , new dirty pages:   122351 , remaining dirty pages:   
> 124338
> 
> 5. web server : An apache web server. The client is configured with 50 
> concurrent connections.
> 
> Iteration   1, duration:  30697 ms , transferred pages:   266450 (n:   
> 251215, d:    15235 ) , new dirty pages:    30628 , remaining dirty pages:    
> 30628
> Iteration   2, duration:   3496 ms , transferred pages:    28859 (n:    
> 28513, d:      346 ) , new dirty pages:     5805 , remaining dirty pages:     
> 7574
> Iteration   3, duration:    701 ms , transferred pages:     5746 (n:     
> 5726, d:       20 ) , new dirty pages:     3433 , remaining dirty pages:     
> 5261
> Iteration   4, duration:    400 ms , transferred pages:     3281 (n:     
> 3272, d:        9 ) , new dirty pages:     1539 , remaining dirty pages:     
> 3519
> Iteration   5, duration:    199 ms , transferred pages:     1653 (n:     
> 1636, d:       17 ) , new dirty pages:      301 , remaining dirty pages:     
> 2167
> Iteration   6, duration:      1 ms , transferred pages:        0 (n:        
> 0, d:        0 ) , new dirty pages:        0 , remaining dirty pages:     2167
> 
> --------------------6 vcpu, 6 GB ram, max bandwidth (941.08 
> mbps):------------------
> 
> 6. memcached : 4 GB cache, memaslap: all write, concurrency = 5  (can not 
> converge)
> 
> Iteration   1, duration:  42486 ms , transferred pages:  1568087 (n:  
> 1216079, d:   352008 ) , new dirty pages:   571940 , remaining dirty pages:   
> 581023
> Iteration   2, duration:  19774 ms , transferred pages:   571700 (n:   
> 567416, d:     4284 ) , new dirty pages:   331690 , remaining dirty pages:   
> 341013
> Iteration   3, duration:  11589 ms , transferred pages:   332187 (n:   
> 332095, d:       92 ) , new dirty pages:   222725 , remaining dirty pages:   
> 231551
> Iteration   4, duration:   7790 ms , transferred pages:   223571 (n:   
> 223499, d:       72 ) , new dirty pages:   157658 , remaining dirty pages:   
> 165638
> Iteration   5, duration:   5518 ms , transferred pages:   158056 (n:   
> 157998, d:       58 ) , new dirty pages:   128130 , remaining dirty pages:   
> 135712
> Iteration   6, duration:   4442 ms , transferred pages:   127764 (n:   
> 127701, d:       63 ) , new dirty pages:   104839 , remaining dirty pages:   
> 112787
> Iteration   7, duration:   3649 ms , transferred pages:   104581 (n:   
> 104523, d:       58 ) , new dirty pages:   100736 , remaining dirty pages:   
> 108942
> Iteration   8, duration:   3532 ms , transferred pages:   101379 (n:   
> 101315, d:       64 ) , new dirty pages:    87869 , remaining dirty pages:    
> 95432
> Iteration   9, duration:   3030 ms , transferred pages:    86841 (n:    
> 86786, d:       55 ) , new dirty pages:    77505 , remaining dirty pages:    
> 86096
> Iteration  10, duration:   2709 ms , transferred pages:    77875 (n:    
> 77814, d:       61 ) , new dirty pages:    77197 , remaining dirty pages:    
> 85418
> Iteration  11, duration:   2696 ms , transferred pages:    77107 (n:    
> 77044, d:       63 ) , new dirty pages:    65010 , remaining dirty pages:    
> 73321
> Iteration  12, duration:   2308 ms , transferred pages:    66540 (n:    
> 66484, d:       56 ) , new dirty pages:    64388 , remaining dirty pages:    
> 71169
> Iteration  13, duration:   2198 ms , transferred pages:    62953 (n:    
> 62897, d:       56 ) , new dirty pages:    62773 , remaining dirty pages:    
> 70989
> Iteration  14, duration:   2214 ms , transferred pages:    63466 (n:    
> 63411, d:       55 ) , new dirty pages:    67538 , remaining dirty pages:    
> 75061
> Iteration  15, duration:   2329 ms , transferred pages:    66924 (n:    
> 66875, d:       49 ) , new dirty pages:    63580 , remaining dirty pages:    
> 71717
> Iteration  16, duration:   2252 ms , transferred pages:    64554 (n:    
> 64539, d:       15 ) , new dirty pages:    63094 , remaining dirty pages:    
> 70257
> Iteration  17, duration:   2188 ms , transferred pages:    62697 (n:    
> 62641, d:       56 ) , new dirty pages:    63016 , remaining dirty pages:    
> 70576
> Iteration  18, duration:   2171 ms , transferred pages:    62377 (n:    
> 62322, d:       55 ) , new dirty pages:    56764 , remaining dirty pages:    
> 64963
> Iteration  19, duration:   2003 ms , transferred pages:    57382 (n:    
> 57324, d:       58 ) , new dirty pages:    65307 , remaining dirty pages:    
> 72888
> Iteration  20, duration:   2240 ms , transferred pages:    64426 (n:    
> 64364, d:       62 ) , new dirty pages:    61585 , remaining dirty pages:    
> 70047
> 
> 
> --
> Chunguang Li, Ph.D. Candidate
> Wuhan National Laboratory for Optoelectronics (WNLO)
> Huazhong University of Science & Technology (HUST)
> Wuhan, Hubei Prov., China
> 
> 
> 
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK



reply via email to

[Prev in Thread] Current Thread [Next in Thread]