[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH][RFC][0/2] REF+/REF- optimization

From: Kevin Wolf
Subject: Re: [Qemu-devel] [PATCH][RFC][0/2] REF+/REF- optimization
Date: Wed, 14 Sep 2011 12:21:05 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:6.0.2) Gecko/20110906 Thunderbird/6.0.2

Am 14.09.2011 11:52, schrieb Frediano Ziglio:
> 2011/9/14 Kevin Wolf <address@hidden>:
>> Am 13.09.2011 15:36, schrieb Frediano Ziglio:
>>> 2011/9/13 Kevin Wolf <address@hidden>:
>>>> Am 13.09.2011 09:53, schrieb Frediano Ziglio:
>>>>> These patches try to trade-off between leaks and speed for clusters
>>>>> refcounts.
>>>>> Refcount increments (REF+ or refp) are handled in a different way from
>>>>> decrements (REF- or refm). The reason it that posting or not flushing
>>>>> a REF- cause "just" a leak while posting a REF+ cause a corruption.
>>>>> To optimize REF- I just used an array to store offsets then when a
>>>>> flush is requested or array reach a limit (currently 1022) the array
>>>>> is sorted and written to disk. I use an array with offset instead of
>>>>> ranges to support compression (an offset could appear multiple times
>>>>> in the array).
>>>>> I consider this patch quite ready.
>>>> Ok, first of all let's clarify what this optimises. I don't think it
>>>> changes anything at all for the writeback cache modes, because these
>>>> already do most operations in memory only. So this must be about
>>>> optimising some operations with cache=writethrough. REF- isn't about
>>>> normal cluster allocation, it is about COW with internal snapshots or
>>>> bdrv_discard. Do you have benchmarks for any of them?
>>>> I strongly disagree with your approach for REF-. We already have a
>>>> cache, and introducing a second one sounds like a bad idea. I think we
>>>> could get a very similar effect if we introduced a
>>>> qcow2_cache_entry_mark_dirty_wb() that marks a given refcount block as
>>>> dirty, but at the same time tells the cache that even in write-through
>>>> mode it can still treat this block as write-back. This should require
>>>> much less code changes.
>>> Yes, mainly optimize for writethrough. I did not test with writeback
>>> but should improve even this (I think here you have some flush to keep
>>> consistency).
>>> I'll try to write a qcow2_cache_entry_mark_dirty_wb patch and test it.
>> Great, thanks!
> Don't expect however the patch too soon, I'm quite busy in these days.

Ok, no problem. It's not really urgent either.

>>>> But let's measure the effects first, I suspect that for cluster
>>>> allocation it doesn't help much because every REF- comes with a REF+.
>>> That's 50% of effort if REF- clusters are far from REF+ :)
>> I would expect that the next REF+ allocates exactly the REF- cluster.
>> But you still have a point, we save the write on REF- and combine it
>> with the REF+ write.
> This is still a TODO for REF+ patch.

Actually, I was talking about the qcow2_cache_entry_mark_dirty_wb() case
without any other change. You get it automatically then.

> Oh... time ago looking at refcount code I realize that a single
> deallocation could be reused in some cases only after Qemu restart.
> For instance
> - got a single cluster REF- which take refcount to 0
> - free_cluster_index get decreased to this index
> - we get a new cluster request for 2 clusters
> - free_cluster_index get increased
> we skip freed deallocation and if we don't get a new deallocation for
> a cluster with index minor to our freed cluster this cluster is not
> reused.
> (I didn't test this behavior, no leak, no corruption, just image could
> be larger then expected)

Yes, I'm aware of that. I'm not sure if it matters in practice.

>>>>> To optimize REF+ I mark a range as allocated and use this range to
>>>>> get new ones (avoiding writing refcount to disk). When a flush is
>>>>> requested or in some situations (like snapshot) this cache is disabled
>>>>> and flushed (written as REF-).
>>>>> I do not consider this patch ready, it works and pass all io-tests
>>>>> but for instance I would avoid allocating new clusters for refcount
>>>>> during preallocation.
>>>> The only question here is if improving cache=writethrough cluster
>>>> allocation performance is worth the additional complexity in the already
>>>> complex refcounting code.
>>> I didn't see this optimization as a second level cache, but yes, for
>>> REF- is a second cache.
>>>> The alternative that was discussed before is the dirty bit approach that
>>>> is used in QED and would allow us to use writeback for all refcount
>>>> blocks, regardless of REF- or REF+. It would be an easier approach
>>>> requiring less code changes, but it comes with the cost of requiring an
>>>> fsck after a qemu crash.
>>> I was thinking about changing the header magic first time we change
>>> refcount in order to mark image as dirty so newer Qemu recognize the
>>> flag while former one does not recognize image. Obviously reverting
>>> magic on image close.
>> We've discussed this idea before and I think it wasn't considered a
>> great idea to automagically change the header in an incompatible way.
>> But we can always say that for improved performance you need to upgrade
>> your image to qcow2 v3.
> I don't understand why there is not a wiki page for detailed qcow3
> changes. I saw your post on May. I follow this ML since August so I
> think I missed a lot of discussion on qcow improves.

Unfortunately there have been almost no comments, so you can consider
RFC v2 as the current proposal.

>>>>> End speed up is quite visible allocating clusters (more then 20%).
>>>> What benchmark do you use for testing this?
>>>> Kevin
>>> Currently I'm using bonnie++ but I noted similar improves with iozone.
>>> The test script format an image then launch a Linux machine which run
>>> a script and save result to a file.
>>> The test image is seems by this virtual machine as a separate disk.
>>> The file on hist reside in a separate LV.
>>> I got quite consistent results (of course not working on the machine
>>> while testing, is not actually dedicated to this job).
>>> Actually I'm running the test (added a test working in a snapshot image).
>> Okay. Let me guess the remaining variables: The image is on an ext4 host
>> filesystem, you use cache=writethrough and virtio-blk. You don't use
>> backing files, compression and encryption. For your tests with internal
>> snapshots you have exactly one internal snapshot that is taken
>> immediately before the benchmark. Oh, and not to forget, KVM is enabled.
>> Are these assumptions correct?
> change ext4 and put xfs and assumptions are ok. Yes I use internal
> snapshots (REF- are useful only in this case). To produce "qcow2s" I
> use these commands
> $QEMU_IMG create -f qcow2 -o preallocation=metadata $FILE 15g
> $QEMU_IMG snapshot -c test $FILE

Ah, using preallocation for this is a nice trick. :-)

Anyway, thanks, I think I understand now what you're measuring.

> I run again tests (yesterdays test was not that consistence, today I
> stopped all unneeded services and avoid the X session tunneled by
> ssh).
> without patched
> run   raw    qcow2   qcow2s
> 1     22758  4206    1054
> 2     28401  21745   17997
> with patches (ref-,ref+)
> run   raw    qcow2   qcow2s
> 1     22563  4965    4382
> 2     23823  21043   20904
> beside I still don't understand difference in second runs for raw
> (about 20%!!!) this confirms the huge improves with snapshot.

Measuring with cache=writethrough means that you use the host page
cache, so I think that could be where you get the difference for raw.
Numbers should be more consistent for cache=none, but that's not what we
want to test...

Did you also test how each patch performs if applied on its own, without
the other one? It would be interesting to see how much REF- optimisation
contributes and how much REF+.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]