[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Abnormal observation during migration: too many "write-

From: Dr. David Alan Gilbert
Subject: Re: [Qemu-devel] Abnormal observation during migration: too many "write-not-dirty" pages
Date: Wed, 15 Nov 2017 10:11:37 +0000
User-agent: Mutt/1.9.1 (2017-09-22)

* Chunguang Li (address@hidden) wrote:
> Hi all!
> I got a very abnormal observation for the VM migration. I found that many 
> pages marked as dirty during migration are "not really dirty", which is, 
> their content are the same as the old version.
> I did the migration experiment like this:
> During the setup phase of migration, first I suspended the VM. Then I copied 
> all the pages within the guest physical address space to a memory buffer as 
> large as the guest memory size. After that, the dirty tracking began and I 
> resumed the VM. Besides, at the end
> of each iteration, I also suspended the VM temporarily. During the 
> suspension, I compared the content of all the pages marked as dirty in this 
> iteration byte-by-byte with their former copies inside the buffer. If the 
> content of one page was the same as its former copy, I recorded it as a 
> "write-not-dirty" page (the page is written exactly with the same content as 
> the old version). Otherwise, I replaced this page in the buffer with the new 
> content, for the possible comparison in the future. After the reset of the 
> dirty bitmap, I resumed the VM. Thus, I obtain the proportion of the 
> write-not-dirty pages within all the pages marked as dirty for each pre-copy 
> iteration.
> I repeated this experiment with 15 workloads, which are 11 CPU2006 
> benchmarks, Memcached server, kernel compilation, playing a video, and an 
> idle VM. The CPU2006 benchmarks and Memcached are write-intensive workloads. 
> So almost all of them did not converge to stop-copy.
> Startlingly, the proportions of the write-not-dirty pages are quite high. 
> Memcached and three CPU2006 benchmarks(zeusmp, mcf and bzip2) have the most 
> high proportions. Their proportions of the write-not-dirty pages within all 
> the dirty pages are as high as 45%-80%. The proportions of the other 
> workloads are about 5%-20%, which are also abnormal. According to my 
> intuition, the proportion of write-not-dirty pages should be far less than 
> these numbers. I think it should be quite a particular case that one page is 
> written with exactly the same content as the former data.
> Besides, the zero pages are not counted for all the results. Because I think 
> codes like memset() may write large area of pages to zero pages, which are 
> already zero pages before.
> I excluded some possible unknown reasons with the machine hardware, because I 
> repeated the experiments with two sets of different machines. Then I guessed 
> it might be related with the huge page feature. However, the result was the 
> same when I turned the huge page feature off in the OS.
> Now there are only two possible reasons in my opinion. 
> First, there is some bugs in the KVM kernel dirty tracking mechanism. It may 
> mark some pages that do not receive write request as dirty.
> Second, there is some bugs in the OS running inside the VM. It may issue some 
> unnecessary write requests.
> What do you think about this abnormal phenomenon? Any advice or possible 
> reasons or even guesses? I appreciate any responses, because it has confused 
> me for a long time. Thank you.

Wasn't it you who pointed out last year the other possibility? - The
problem of false positives due to sync'ing the whole of memory and then
writing the data out, but some of the dirty pages were already written?


> --
> Chunguang Li, Ph.D. Candidate
> Wuhan National Laboratory for Optoelectronics (WNLO)
> Huazhong University of Science & Technology (HUST)
> Wuhan, Hubei Prov., China
Dr. David Alan Gilbert / address@hidden / Manchester, UK

reply via email to

[Prev in Thread] Current Thread [Next in Thread]