[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation

From: Denis V. Lunev
Subject: Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation
Date: Thu, 13 Apr 2017 17:06:24 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0

On 04/13/2017 04:36 PM, Alberto Garcia wrote:
> On Thu 13 Apr 2017 03:09:53 PM CEST, Denis V. Lunev wrote:
>>>> For nowadays SSD we are facing problems somewhere else. Right now I
>>>> can achieve only 100k IOPSes on SSD capable of 350-550k. 1 Mb block
>>>> with preallocation and fragmented L2 cache gives same 100k. Tests
>>>> for initially empty image gives around 80k for us.
>>> Preallocated images aren't particularly interesting to me. qcow2 is
>>> used mainly for two reasons. One of them is sparseness (initially
>>> small file size) mostly for desktop use cases with no serious I/O, so
>>> not that interesting either. The other one is snapshots, i.e. backing
>>> files, which doesn't work with preallocation (yet).
>>> Actually, preallocation with backing files is something that
>>> subclusters would automatically enable: You could already reserve the
>>> space for a cluster, but still leave all subclusters marked as
>>> unallocated.
>> I am spoken about fallocate() for the entire cluster before actual
>> write() for originally empty image. This increases the performance of
>> 4k random writes 10+ times. In this case we can just write those 4k
>> and do nothing else.
> You're talking about using fallocate() for filling a cluster with zeroes
> before writing data to it.
> As noted earlier in this thread, this works if the image is empty or if
> it doesn't have a backing file.
> And if the image is not empty you cannot guarantee that the cluster
> contains zeroes (you can use FALLOC_FL_ZERO_RANGE, but that won't work
> in all cases).
> Berto
yes, I agree here.

But COW operations suffer more from the amount of IO operations
required rather than from the amount of data transferred. Let us
assume that we have 64k cluster represented as [--------]. With
4k write in the middle of the cluster we will have now 5 IOPSes
to perform the operation: read head, write head, write 4k, read tail,
write tail. Normally this should take 2 operations: read entire
cluster (64kb), write entire cluster (64kb).

In this approach further 64kb read of this cluster will result in
1 host IOPS - the file is continuous from the point of the host file

With 8kb subclusters we will have same 1 read and 1 write after the
tuning. The difference is only the amount of data read: 8kb and
8 kb write instead of 64kb read and 64kb write.

Sure the size of the cluster should be increased. Personally I like
1 MB due to my past experience.

At my opinion there is no difference at all in terms of performance
for rotational drive at all for COW of 1 MB cluster and 64 kb cluster
without subclusters. Reading of 1 Mb and reading of 64 kb are the
same (100-150 IOPSes with 150 Mb/s throughput). Further continuous
reads will be much better with 1 MB blocks and without subclusters.

Yes, there is a difference on "average" SSD drives, which gives
40k-100k IOPSes. We will experience slowdown reading 1 MB
instead of 64kb. But the difference is not that big actually.
Top notch nowadays PCIe SSDs can not be saturated with QEMU
nowadays. I am able to reach only 100k IOPSes instead of
300k-500k in host even when the data is written to the existing
clusters. Thus we will have no difference with them between 1 MB
cluster and 64 kb cluster in terms of COW.

So, at my opinion, simple 1 Mb cluster size along as fragmented
L2 cache is very good from all points. Even from COW point ;)
The situation in real life will not be worse or better from the performance
point of view as we also will avoid additional metadata updates.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]