qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] virtio-blk performance regression and qemu-kvm


From: Dongsu Park
Subject: Re: [Qemu-devel] virtio-blk performance regression and qemu-kvm
Date: Wed, 22 Feb 2012 17:48:40 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

Hi Stefan,

see below.

On 21.02.2012 17:27, Stefan Hajnoczi wrote:
> On Tue, Feb 21, 2012 at 3:57 PM, Dongsu Park
> <address@hidden> wrote:
...<snip>...
> I'm not sure if O_DIRECT and Linux AIO to /dev/ram0 is a good idea.
> At least with tmpfs O_DIRECT does not even work - which kind of makes
> sense there because tmpfs lives in the page cache.  My point here is
> that ramdisk does not follow the same rules or have the same
> performance characteristics as real disks do.  It's something to be
> careful about.  Did you run this test because you noticed a real-world
> regression?

That's a good point.
I agree with you. /dev/ram0 isn't a good choice in this case.
Of course I noticed real-world regressions, but not with /dev/ram0.

Therefore I tested again with a block device backed by a raw file image.
Its result was however nearly the same: regression since 0.15.

...<snip>...
> Try turning ioeventfd off for the virtio-blk device:
> 
> -device virtio-blk-pci,ioeventfd=off,...
> 
> You might see better performance since ramdisk I/O should be very
> low-latency.  The overhead of using ioeventfd might not make it
> worthwhile.  The ioeventfd feature was added post-0.14 IIRC.  Normally
> it helps avoid stealing vcpu time and also causing lock contention
> inside the guest - but if host I/O latency is extremely low it might
> be faster to issue I/O from the vcpu thread.

Thanks for the tip. I tried that too, but no success.

However, today I observed interesting phenomenen.
On qemu-kvm command-line, if I set -smp maxcpus to 32,
R/W bandwidth gets boosted up to 100 MBps.

# /usr/bin/kvm ...
 -smp 2,cores=1,maxcpus=32,threads=1 -numa mynode,mem=32G,nodeid=mynodeid

That looks weird, because my test machine has only 4 physical CPUs.
But setting maxcpus=4 brings only poor performance.(< 30 MBps)

Additionally, performance seems to decrease if more vCPUs are pinned.
In libvirt xml, for example, "<vcpu cpuset='0-1'>2</vcpu>" causes
performance degradation, but "<vcpu cpuset='1'>2</vcpu>" is ok.
That doesn't look reasonable either.

Cheers,
Dongsu



reply via email to

[Prev in Thread] Current Thread [Next in Thread]