qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 0/7] qcow2: async handling of fragmented io


From: Vladimir Sementsov-Ogievskiy
Subject: Re: [Qemu-devel] [PATCH 0/7] qcow2: async handling of fragmented io
Date: Thu, 16 Aug 2018 16:58:46 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0

16.08.2018 03:51, Max Reitz wrote:
On 2018-08-07 19:43, Vladimir Sementsov-Ogievskiy wrote:
Hi all!

Here is an asynchronous scheme for handling fragmented qcow2
reads and writes. Both qcow2 read and write functions loops through
sequential portions of data. The series aim it to parallelize these
loops iterations.

It improves performance for fragmented qcow2 images, I've tested it
as follows:

I have four 4G qcow2 images (with default 64k block size) on my ssd disk:
t-seq.qcow2 - sequentially written qcow2 image
t-reverse.qcow2 - filled by writing 64k portions from end to the start
t-rand.qcow2 - filled by writing 64k portions (aligned) in random order
t-part-rand.qcow2 - filled by shuffling order of 64k writes in 1m clusters
(see source code of image generation in the end for details)

and the test (sequential io by 1mb chunks):

test write:
     for t in /ssd/t-*; \
         do sync; echo 1 > /proc/sys/vm/drop_caches; echo ===  $t  ===; \
         ./qemu-img bench -c 4096 -d 1 -f qcow2 -n -s 1m -t none -w $t; \
     done

test read (same, just drop -w parameter):
     for t in /ssd/t-*; \
         do sync; echo 1 > /proc/sys/vm/drop_caches; echo ===  $t  ===; \
         ./qemu-img bench -c 4096 -d 1 -f qcow2 -n -s 1m -t none $t; \
     done

short info about parameters:
   -w - do writes (otherwise do reads)
   -c - count of blocks
   -s - block size
   -t none - disable cache
   -n - native aio
   -d 1 - don't use parallel requests provided by qemu-img bench itself
Hm, actually, why not?  And how does a guest behave?

If parallel requests on an SSD perform better, wouldn't a guest issue
parallel requests to the virtual device and thus to qcow2 anyway?

Guest knows nothing about qcow2 fragmentation, so this kind of "asynchronization" could be done only at qcow2 level. However, if guest do async io, send a lot of parallel requests, it behave like qemu-img without -d 1 option, and in this case, parallel loop iterations in qcow2 doesn't have such great sense. However, I think that async parallel requests are better in general than sequential, because if device have some unused opportunity of parallelization, it will be utilized. We've already use this approach in mirror and qemu-img convert. In Virtuozzo we have backup, improved by parallelization of requests loop too. I think, it would be good to have some general code for such things in future.


(I suppose the global qcow2 lock could be an issue here, but then your
benchmark should work even without -d 1.)

Max



--
Best regards,
Vladimir



reply via email to

[Prev in Thread] Current Thread [Next in Thread]