On Tue 13 Dec 2016 09:02:34 AM CET, Max Reitz wrote:
I definitely like how simple your approach is, but from a design
standpoint it is not exactly optimal, because O(n) for a cluster
lookup is simply worse than O(1).
It's actually not O(n) anymore since I rewrote that code last year.
When I changed the algorithm to use LRU for cache eviction I also
optimized the lookup:
1) qcow2_cache_put() simply divides the table offset by the cluster
size and it immediately obtains the index.
2) qcow2_cache_get() hashes the offset to do the lookup. In my tests
doing random I/O and a 2048 entry cache the average number of
comparisons was 2.5 (430 before the rewrite).
The cache was ~13% faster after the rewrite, but the results in terms of
actual disk I/O improvements were negligible, because that's not where
the bottleneck is. We even discussed how the hash function that I was
using was weak, but we decided not to touch it because it was not worth
the effort.
But leaving that aside, would that improve anything? I don't think
the cache itself adds any significant overhead here, IIRC even in
your presentation at KVM Forum 2015 qcow2 was comparable to raw as
long as all L2 tables were cached in memory.
I haven't compared CPU usage, though. That may have gone up quite a
bit, I don't know. For large enough images, it may even become a
bottleneck.
I don't know, I don't have any data to suggest that there's a problem
there, particularly after the rewrite from last year I think the cache
is reasonably fast (as long as the cache is big enough; and if it's not
big enough the problem is going to be in the extra I/O).