[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH][RFC][0/2] REF+/REF- optimization

From: Frediano Ziglio
Subject: Re: [Qemu-devel] [PATCH][RFC][0/2] REF+/REF- optimization
Date: Wed, 14 Sep 2011 11:52:08 +0200

2011/9/14 Kevin Wolf <address@hidden>:
> Am 13.09.2011 15:36, schrieb Frediano Ziglio:
>> 2011/9/13 Kevin Wolf <address@hidden>:
>>> Am 13.09.2011 09:53, schrieb Frediano Ziglio:
>>>> These patches try to trade-off between leaks and speed for clusters
>>>> refcounts.
>>>> Refcount increments (REF+ or refp) are handled in a different way from
>>>> decrements (REF- or refm). The reason it that posting or not flushing
>>>> a REF- cause "just" a leak while posting a REF+ cause a corruption.
>>>> To optimize REF- I just used an array to store offsets then when a
>>>> flush is requested or array reach a limit (currently 1022) the array
>>>> is sorted and written to disk. I use an array with offset instead of
>>>> ranges to support compression (an offset could appear multiple times
>>>> in the array).
>>>> I consider this patch quite ready.
>>> Ok, first of all let's clarify what this optimises. I don't think it
>>> changes anything at all for the writeback cache modes, because these
>>> already do most operations in memory only. So this must be about
>>> optimising some operations with cache=writethrough. REF- isn't about
>>> normal cluster allocation, it is about COW with internal snapshots or
>>> bdrv_discard. Do you have benchmarks for any of them?
>>> I strongly disagree with your approach for REF-. We already have a
>>> cache, and introducing a second one sounds like a bad idea. I think we
>>> could get a very similar effect if we introduced a
>>> qcow2_cache_entry_mark_dirty_wb() that marks a given refcount block as
>>> dirty, but at the same time tells the cache that even in write-through
>>> mode it can still treat this block as write-back. This should require
>>> much less code changes.
>> Yes, mainly optimize for writethrough. I did not test with writeback
>> but should improve even this (I think here you have some flush to keep
>> consistency).
>> I'll try to write a qcow2_cache_entry_mark_dirty_wb patch and test it.
> Great, thanks!

Don't expect however the patch too soon, I'm quite busy in these days.

>>> But let's measure the effects first, I suspect that for cluster
>>> allocation it doesn't help much because every REF- comes with a REF+.
>> That's 50% of effort if REF- clusters are far from REF+ :)
> I would expect that the next REF+ allocates exactly the REF- cluster.
> But you still have a point, we save the write on REF- and combine it
> with the REF+ write.

This is still a TODO for REF+ patch.

Oh... time ago looking at refcount code I realize that a single
deallocation could be reused in some cases only after Qemu restart.
For instance
- got a single cluster REF- which take refcount to 0
- free_cluster_index get decreased to this index
- we get a new cluster request for 2 clusters
- free_cluster_index get increased
we skip freed deallocation and if we don't get a new deallocation for
a cluster with index minor to our freed cluster this cluster is not
(I didn't test this behavior, no leak, no corruption, just image could
be larger then expected)

>>>> To optimize REF+ I mark a range as allocated and use this range to
>>>> get new ones (avoiding writing refcount to disk). When a flush is
>>>> requested or in some situations (like snapshot) this cache is disabled
>>>> and flushed (written as REF-).
>>>> I do not consider this patch ready, it works and pass all io-tests
>>>> but for instance I would avoid allocating new clusters for refcount
>>>> during preallocation.
>>> The only question here is if improving cache=writethrough cluster
>>> allocation performance is worth the additional complexity in the already
>>> complex refcounting code.
>> I didn't see this optimization as a second level cache, but yes, for
>> REF- is a second cache.
>>> The alternative that was discussed before is the dirty bit approach that
>>> is used in QED and would allow us to use writeback for all refcount
>>> blocks, regardless of REF- or REF+. It would be an easier approach
>>> requiring less code changes, but it comes with the cost of requiring an
>>> fsck after a qemu crash.
>> I was thinking about changing the header magic first time we change
>> refcount in order to mark image as dirty so newer Qemu recognize the
>> flag while former one does not recognize image. Obviously reverting
>> magic on image close.
> We've discussed this idea before and I think it wasn't considered a
> great idea to automagically change the header in an incompatible way.
> But we can always say that for improved performance you need to upgrade
> your image to qcow2 v3.

I don't understand why there is not a wiki page for detailed qcow3
changes. I saw your post on May. I follow this ML since August so I
think I missed a lot of discussion on qcow improves.

>>>> End speed up is quite visible allocating clusters (more then 20%).
>>> What benchmark do you use for testing this?
>>> Kevin
>> Currently I'm using bonnie++ but I noted similar improves with iozone.
>> The test script format an image then launch a Linux machine which run
>> a script and save result to a file.
>> The test image is seems by this virtual machine as a separate disk.
>> The file on hist reside in a separate LV.
>> I got quite consistent results (of course not working on the machine
>> while testing, is not actually dedicated to this job).
>> Actually I'm running the test (added a test working in a snapshot image).
> Okay. Let me guess the remaining variables: The image is on an ext4 host
> filesystem, you use cache=writethrough and virtio-blk. You don't use
> backing files, compression and encryption. For your tests with internal
> snapshots you have exactly one internal snapshot that is taken
> immediately before the benchmark. Oh, and not to forget, KVM is enabled.
> Are these assumptions correct?

change ext4 and put xfs and assumptions are ok. Yes I use internal
snapshots (REF- are useful only in this case). To produce "qcow2s" I
use these commands

$QEMU_IMG create -f qcow2 -o preallocation=metadata $FILE 15g
$QEMU_IMG snapshot -c test $FILE

> Kevin

I run again tests (yesterdays test was not that consistence, today I
stopped all unneeded services and avoid the X session tunneled by

without patched
run   raw    qcow2   qcow2s
1     22758  4206    1054
2     28401  21745   17997

with patches (ref-,ref+)
run   raw    qcow2   qcow2s
1     22563  4965    4382
2     23823  21043   20904

beside I still don't understand difference in second runs for raw
(about 20%!!!) this confirms the huge improves with snapshot.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]