qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] Re: [PATCH 02/10] Add buffered_file_internal constant


From: Anthony Liguori
Subject: [Qemu-devel] Re: [PATCH 02/10] Add buffered_file_internal constant
Date: Tue, 30 Nov 2010 14:23:26 -0600
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.15) Gecko/20101027 Lightning/1.0b1 Thunderbird/3.0.10

On 11/30/2010 01:15 PM, Juan Quintela wrote:
Anthony Liguori<address@hidden>  wrote:
On 11/30/2010 12:04 PM, Juan Quintela wrote:
Anthony Liguori<address@hidden>   wrote:

On 11/30/2010 10:32 AM, Juan Quintela wrote:

"Michael S. Tsirkin"<address@hidden>    wrote:


On Tue, Nov 30, 2010 at 04:40:41PM +0100, Juan Quintela wrote:


Basically our bitmap handling code is "exponential" on memory size,


I didn't realize this. What makes it exponential?


Well, 1st of all, it is "exponential" as you measure it.

stalls by default are:

1-2GB: milliseconds
2-4GB: 100-200ms
4-8GB: 1s
64GB: 59s
400GB: 24m (yes, minutes)

That sounds really exponential.


How are you measuring stalls btw?

At the end of the ram_save_live().  This was the reason that I put the
information there.

for the 24mins stall (I don't have that machine anymore) I had less
"exact" measurements.  It was the amount that it "decided" to sent in
the last non-live part of memory migration.  With the stalls&   zero page
account, we just got to the point where we had basically infinity speed.

That's not quite guest visible.
Humm, guest don't answer in 24mins
monitor don't answer in 24mins
ping don't answer in 24mins

are you sure that this is not visible?  Bug report put that guest had
just died, it was me who waited to see that it took 24mins to end.

I'm extremely sceptical that any of your patches would address this problem. Even if you had to scan every page in a 400GB guest, it would not take 24 minutes. Something is not quite right here.

24 minutes suggests that there's another problem that is yet to be identified.

Regards,

Anthony Liguori

It only is a "stall" if the guest is trying to access device emulation
and acquiring the qemu_mutex.  A more accurate measurement would be
something that measured guest availability.  For instance, I tight
loop of while (1) { usleep(100); gettimeofday(); } that then recorded
periods of unavailability>  X.
This is better, and this is what qemu_mutex change should fix.

Of course, it's critically important that a working version of pvclock
be available int he guest for this to be accurate.
If the problem are 24mins, we don't need such an "exact" version O:-)

Later, Juan.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]