[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] Re: slow ext4 O_SYNC writes (why qemu qcow2 is so slow on e

From: Michael Tokarev
Subject: [Qemu-devel] Re: slow ext4 O_SYNC writes (why qemu qcow2 is so slow on ext4 vs ext3)
Date: Tue, 20 Jul 2010 17:41:33 +0300
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv: Gecko/20100619 Icedove/3.0.5

20.07.2010 16:46, Jan Kara wrote:

On Fri 02-07-10 16:46:28, Michael Tokarev wrote:
Hash: SHA1

I noticed that qcow2 images, esp. fresh ones (so that they
receive lots of metadata updates) are very slow on my
machine.  And on IRC (#kvm), Sheldon Hearn found that on
ext3, it is fast again.

So I tested different combinations for a bit, and observed
the following:

for fresh qcow2 file, with default qemu cache settings,
copying kernel source is about 10 times slower on ext4
than on ext3.  Second copy (rewrite) is significantly
faster in both cases (expectable), but still ~20% slower
on ext4 than on ext3.

Normal cache mode in qemu is writethrough, which translates
to O_SYNC file open mode.

With cache=none, which translates to O_DIRECT, metadata-
intensive writes (fresh qcow) are about as slow as on
ext4 with O_SYNC, and rewrite is expectedly faster, but
now there's _no_ difference in speed between ext3 and ext4.

I did a series of straces of the writer processes, -- time
spent in pwrite() syscalls is significantly larger for
ext4 with O_SYNC than with ext3 with O_SYNC, the diff is
about 50 times.

Also, with slower I/O in case of ext4, qemu-kvm starts more
I/O threads, which, as it seems, slows whole thing down even
further - I changed max_threads from default 64 to 16, and
the speed improved slightly.  Here, the diff. is again quite
significant: on ext3 qemu spawns only 8 threads, while on
ext4 all 64 I/O threads are spawned almost immediately.

So I've two questions:

  1.  Why ext4 O_SYNC is too slow compared with ext3 O_SYNC?
    This is observed on 2.6.32 and 2.6.34 kernels, barriers
    or data={writeback|ordered} had no difference.  I tested
    whole thing on a partition on a single drive, sheldonh
    used ext[34]fs on top of lvm on a raid1 volume.
   Do I get it right, that you have ext3/4 which carries fs images used by
KVM? What you describe is strange. Up to this moment it sounded to me like
a difference in barrier settings on the host but you seem to have tried
that. Just stabbing in the dark - could you try nodelalloc mount option
of ext4?

Yes, exactly, a guest filesystem image stored on ext3 or
ext4.  And yes, I suspected barriers too, but immediately
ruled that out, since barrier or no barrier does not matter
in this test.

I'll try nodelalloc, but I'm not sure when: right now I'm at
vacation, typing from a hotel, and my home machine whith all
the guest images and the like is turned off and - for some
reason - I can't wake it up over ethernet, it seemingly ignores
WOL packets.  Too bad I don't have any guest image here on my

  2.  The number of threads spawned for I/O... this is a good
    question, how to find an adequate cap.  Different hw has
    different capabilities, and we may have more users doing
    I/O at the same time...

   Maybe you could measure your total throughput over some period,
try increasing number of threads in the next period and if it
helps significantly, use larger number, otherwise go back to a
smaller number?

Well, this is, again, a good question -- it's how qemu works right
now, spawning up to 64 I/O threads for all I/O requiests guests
submits.  The slower the I/O, the more threads can be spawned.
Working that part out is a separate, difficult job.

The main question here is why ext4 is so slow for O_[D]SYNC writes.

Besides, quite similar topic were discussed meanwhile, in a different
thread titled "BTRFS: Unbelievably slow with kvm/qemu" -- see f.e.
http://marc.info/?t=127891236700003&r=1&w=2 .  In particular, this
message http://marc.info/?l=linux-kernel&m=127913696420974 shows
a comparison table for a few filesystems and qemu/kvm usage, but on
raw files instead of qcow.

Different qemu/kvm guest fs image options are (partial list):

 raw disk image in a file on host.  Either pre-allocated or
   (initially) sparse.  The pre-allocated case should - in
   theory - work equally on all filesystems.  While sparse
   case should differ per filesystem, depending on how different
   filesystems allocate data.

 qcow[2] image in a file on host.  This one is never sparse,
  but unlike raw it also contains some qemu-specific metadata,
  like which blocks are allocated and in which place, sorta
  like lvm.  Initially it is created empty (with only a header),
  and when guest perform writes, new blocks are allocated and
  metadata gets updated.  This requires some more writes than
  the guest performs, and quite a few syncs (with O_SYNC they're



reply via email to

[Prev in Thread] Current Thread [Next in Thread]