[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH][RFC][0/2] REF+/REF- optimization

From: Frediano Ziglio
Subject: Re: [Qemu-devel] [PATCH][RFC][0/2] REF+/REF- optimization
Date: Tue, 13 Sep 2011 15:36:16 +0200

2011/9/13 Kevin Wolf <address@hidden>:
> Am 13.09.2011 09:53, schrieb Frediano Ziglio:
>> These patches try to trade-off between leaks and speed for clusters
>> refcounts.
>> Refcount increments (REF+ or refp) are handled in a different way from
>> decrements (REF- or refm). The reason it that posting or not flushing
>> a REF- cause "just" a leak while posting a REF+ cause a corruption.
>> To optimize REF- I just used an array to store offsets then when a
>> flush is requested or array reach a limit (currently 1022) the array
>> is sorted and written to disk. I use an array with offset instead of
>> ranges to support compression (an offset could appear multiple times
>> in the array).
>> I consider this patch quite ready.
> Ok, first of all let's clarify what this optimises. I don't think it
> changes anything at all for the writeback cache modes, because these
> already do most operations in memory only. So this must be about
> optimising some operations with cache=writethrough. REF- isn't about
> normal cluster allocation, it is about COW with internal snapshots or
> bdrv_discard. Do you have benchmarks for any of them?
> I strongly disagree with your approach for REF-. We already have a
> cache, and introducing a second one sounds like a bad idea. I think we
> could get a very similar effect if we introduced a
> qcow2_cache_entry_mark_dirty_wb() that marks a given refcount block as
> dirty, but at the same time tells the cache that even in write-through
> mode it can still treat this block as write-back. This should require
> much less code changes.

Yes, mainly optimize for writethrough. I did not test with writeback
but should improve even this (I think here you have some flush to keep
I'll try to write a qcow2_cache_entry_mark_dirty_wb patch and test it.

> But let's measure the effects first, I suspect that for cluster
> allocation it doesn't help much because every REF- comes with a REF+.

That's 50% of effort if REF- clusters are far from REF+ :)

>> To optimize REF+ I mark a range as allocated and use this range to
>> get new ones (avoiding writing refcount to disk). When a flush is
>> requested or in some situations (like snapshot) this cache is disabled
>> and flushed (written as REF-).
>> I do not consider this patch ready, it works and pass all io-tests
>> but for instance I would avoid allocating new clusters for refcount
>> during preallocation.
> The only question here is if improving cache=writethrough cluster
> allocation performance is worth the additional complexity in the already
> complex refcounting code.

I didn't see this optimization as a second level cache, but yes, for
REF- is a second cache.

> The alternative that was discussed before is the dirty bit approach that
> is used in QED and would allow us to use writeback for all refcount
> blocks, regardless of REF- or REF+. It would be an easier approach
> requiring less code changes, but it comes with the cost of requiring an
> fsck after a qemu crash.

I was thinking about changing the header magic first time we change
refcount in order to mark image as dirty so newer Qemu recognize the
flag while former one does not recognize image. Obviously reverting
magic on image close.

>> End speed up is quite visible allocating clusters (more then 20%).
> What benchmark do you use for testing this?
> Kevin

Currently I'm using bonnie++ but I noted similar improves with iozone.
The test script format an image then launch a Linux machine which run
a script and save result to a file.
The test image is seems by this virtual machine as a separate disk.
The file on hist reside in a separate LV.
I got quite consistent results (of course not working on the machine
while testing, is not actually dedicated to this job).

Actually I'm running the test (added a test working in a snapshot image).


reply via email to

[Prev in Thread] Current Thread [Next in Thread]