|Subject:||Re: [Qemu-block] [PATCH 0/7] qcow2: async handling of fragmented io|
|Date:||Thu, 16 Aug 2018 16:58:46 +0300|
|User-agent:||Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0|
16.08.2018 03:51, Max Reitz wrote:
On 2018-08-07 19:43, Vladimir Sementsov-Ogievskiy wrote:Hi all! Here is an asynchronous scheme for handling fragmented qcow2 reads and writes. Both qcow2 read and write functions loops through sequential portions of data. The series aim it to parallelize these loops iterations. It improves performance for fragmented qcow2 images, I've tested it as follows: I have four 4G qcow2 images (with default 64k block size) on my ssd disk: t-seq.qcow2 - sequentially written qcow2 image t-reverse.qcow2 - filled by writing 64k portions from end to the start t-rand.qcow2 - filled by writing 64k portions (aligned) in random order t-part-rand.qcow2 - filled by shuffling order of 64k writes in 1m clusters (see source code of image generation in the end for details) and the test (sequential io by 1mb chunks): test write: for t in /ssd/t-*; \ do sync; echo 1 > /proc/sys/vm/drop_caches; echo === $t ===; \ ./qemu-img bench -c 4096 -d 1 -f qcow2 -n -s 1m -t none -w $t; \ done test read (same, just drop -w parameter): for t in /ssd/t-*; \ do sync; echo 1 > /proc/sys/vm/drop_caches; echo === $t ===; \ ./qemu-img bench -c 4096 -d 1 -f qcow2 -n -s 1m -t none $t; \ done short info about parameters: -w - do writes (otherwise do reads) -c - count of blocks -s - block size -t none - disable cache -n - native aio -d 1 - don't use parallel requests provided by qemu-img bench itselfHm, actually, why not? And how does a guest behave? If parallel requests on an SSD perform better, wouldn't a guest issue parallel requests to the virtual device and thus to qcow2 anyway?
Guest knows nothing about qcow2 fragmentation, so this kind of "asynchronization" could be done only at qcow2 level.
However, if guest do async io, send a lot of parallel requests, it behave like qemu-img without -d 1 option, and in this case,
parallel loop iterations in qcow2 doesn't have such great sense. However, I think that async parallel requests are better in
general than sequential, because if device have some unused opportunity of parallelization, it will be utilized. We've already
use this approach in mirror and qemu-img convert. In Virtuozzo we have backup, improved by parallelization of requests
loop too. I think, it would be good to have some general code for such things in future.
(I suppose the global qcow2 lock could be an issue here, but then your benchmark should work even without -d 1.) Max
-- Best regards, Vladimir
|[Prev in Thread]||Current Thread||[Next in Thread]|