|
From: | ein |
Subject: | Re: [Qemu-devel] Very poor IO performance which looks like some design problem. |
Date: | Sat, 11 Apr 2015 21:00:55 +0200 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Icedove/31.4.0 |
Let me present some tests: Before every tests I did drop caches on the host and reboot the VM. 1. 8 cores:Coping large file inside VM from/to different physical disks (source is soft raid, destination is SSD in hardware RAID1):write read 1. 1 core:write read 3. 1 core, image @ block device:write read 3. 1 core, image @ block device + AIO=native,cache=none:Write Read 4. 8 cores, disk image @ lvm/xfs @ SSD (hw RAID1):
Copy & paste big file in VM from SSD to SSD (same logical volume). 5. 1 core, disk image @ lvm/xfs @ SSD (hw RAID1):Copy & paste big file in VM from SSD to SSD (same logical volume). As u can see there is significant correlation between vCPU number and throughput. Increasing vCPU number will decease throughput. On 04/11/2015 07:10 PM, ein wrote:
On 04/11/2015 03:09 PM, Paolo Bonzini wrote:On 10/04/2015 22:38, ein wrote:Qemu creates more than 70 threads and everyone of them tries to write to disk, which results in: 1. High I/O time. 2. Large latency. 2. Poor sequential read/write speeds. When I limited number of cores, I guess I limited number of threads as well. That's why I got better numbers. I've tried to combine AIO native and thread setting with deadline scheduler. Native AIO was much more worse. The final question, is there any way to prevent Qemu for making so large number of processes when VM does only one sequential R/W operation?Use "aio=native,cache=none". If that's not enough, you'll need to use XFS or a block device; ext4 suffers from spinlock contention on O_DIRECT I/O.Hello Paolo and thank you for reply. Firstly, I do use ext2 now, which gave me more MiB/s than XFS in the past. I've tried combination with XFS and block_device with NTFS (4KB) on it. I did tests with AIO=native,cache=none. Results in this workload was significantly worse. I don't have numbers on me right now but if somebody is interested, I'll redo the tests. From my experience I can say that disabling every software caches gives significant boost in sequential RW ops. I mean: Qemu cache, linux kernel dirty pages or even caching on VM itself. It makes somehow speed of data flow softer and more stable. Using cache creates hiccups. Firstly there's enormous speed for couple of seconds, more than hardware is capable of, then flush and no data flow at all (or very little) in few / over a dozen / seconds. |
0xF2C6EA10.asc
Description: application/pgp-keys
[Prev in Thread] | Current Thread | [Next in Thread] |