qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Re: [RFC] Disk integrity in QEMU


From: Laurent Vivier
Subject: Re: [Qemu-devel] Re: [RFC] Disk integrity in QEMU
Date: Mon, 13 Oct 2008 22:21:46 +0200


Le 13 oct. 08 à 21:43, Ryan Harper a écrit :

* Laurent Vivier <address@hidden> [2008-10-13 13:52]:

Le 13 oct. 08 à 19:06, Ryan Harper a écrit :

* Anthony Liguori <address@hidden> [2008-10-09 12:00]:
Read performance should be unaffected by using O_DSYNC.  O_DIRECT
will
significantly reduce read performance.  I think we should use
O_DSYNC by
default and I have sent out a patch that contains that.  We will
follow
up with benchmarks to demonstrate this.


Hi Ryan,

as "cache=on" implies a factor (memory) shared by the whole system,
you must take into account the size of the host memory and run some
applications (several guests ?) to pollute the host cache, for
instance you can run 4 guest and run bench in each of them
concurrently, and you could reasonably limits the size of the host
memory to 5 x the size of the guest memory.
(for instance 4 guests with 128 MB on a host with 768 MB).

I'm not following you here, the only assumption I see is that we have 1g
of host mem free for caching the write.

Is this a realistic use case ?



as O_DSYNC implies journal commit, you should run a bench on the ext3
host file system concurrently to the bench on a guest to see the
impact of the commit on each bench.

I understand the goal here, but what sort of host ext3 journaling load
is appropriate.  Additionally, when we're exporting block devices, I
don't believe the ext3 journal is an issue.

Yes, it's a comment for the last test case.
I think you can run the same benchmark as you do in the guest.





baremetal baseline (1g dataset):
---------------------------+-------+-------+--------------
+------------+
Test scenarios             | bandw | % CPU | ave submit   | ave
compl  |
type, block size, iface    | MB/s  | usage | latency usec | latency
ms |
---------------------------+-------+-------+--------------
+------------+
write, 16k, lvm, direct=1  | 127.7 |  12   |   11.66      |
9.48    |
write, 64k, lvm, direct=1  | 178.4 |   5   |   13.65      |
27.15    |
write, 1M,  lvm, direct=1  | 186.0 |   3   |  163.75      |
416.91    |
---------------------------+-------+-------+--------------
+------------+
read , 16k, lvm, direct=1  | 170.4 |  15   |   10.86      |
7.10    |
read , 64k, lvm, direct=1  | 199.2 |   5   |   12.52      |
24.31    |
read , 1M,  lvm, direct=1  | 202.0 |   3   |  133.74      |
382.67    |
---------------------------+-------+-------+--------------
+------------+


Could you recall which benchmark you use ?

yeah:

fio --name=guestrun --filename=/dev/vda --rw=write --bs=${SIZE}
--ioengine=libaio --direct=1 --norandommap --numjobs=1 -- group_reporting
--thread --size=1g --write_lat_log --write_bw_log --iodepth=74


Thank you...


kvm write (1g dataset):
---------------------------+-------+-------+--------------
+------------+
Test scenarios             | bandw | % CPU | ave submit   | ave
compl  |
block size,iface,cache,sync| MB/s  | usage | latency usec | latency
ms |
---------------------------+-------+-------+--------------
+------------+
16k,virtio,off,none        | 135.0 |  94   |    9.1       |
8.71    |
16k,virtio,on ,none        | 184.0 | 100   |   63.69      |
63.48    |
16k,virtio,on ,O_DSYNC     | 150.0 |  35   |    6.63      |
8.31    |
---------------------------+-------+-------+--------------
+------------+
64k,virtio,off,none        | 169.0 |  51   |   17.10      |
28.00    |
64k,virtio,on ,none        | 189.0 |  60   |   69.42      |
24.92    |
64k,virtio,on ,O_DSYNC     | 171.0 |  48   |   18.83      |
27.72    |
---------------------------+-------+-------+--------------
+------------+
1M ,virtio,off,none        | 142.0 |  30   |  7176.00     |
523.00    |
1M ,virtio,on ,none        | 190.0 |  45   |  5332.63     |
392.35    |
1M ,virtio,on ,O_DSYNC     | 164.0 |  39   |  6444.48     |
471.20    |
---------------------------+-------+-------+--------------
+------------+

According to the semantic, I don't understand how O_DSYNC can be
better than cache=off in this case...

I don't have a good answer either, but O_DIRECT and O_DSYNC are
different paths through the kernel.  This deserves a better reply, but
I don't have one off the top of my head.

The O_DIRECT kernel path should be more "direct" than the O_DSYNC one. Perhaps a oprofile could help to understand ? What it is strange also is the CPU usage with cache=off. It should be lower than others, perhaps an alignment issue ?
 due to the LVM ?





kvm read (1g dataset):
---------------------------+-------+-------+--------------
+------------+
Test scenarios             | bandw | % CPU | ave submit   | ave
compl  |
block size,iface,cache,sync| MB/s  | usage | latency usec | latency
ms |
---------------------------+-------+-------+--------------
+------------+
16k,virtio,off,none        | 175.0 |  40   |   22.42      |
6.71    |
16k,virtio,on ,none        | 211.0 | 147   |   59.49      |
5.54    |
16k,virtio,on ,O_DSYNC     | 212.0 | 145   |   60.45      |
5.47    |
---------------------------+-------+-------+--------------
+------------+
64k,virtio,off,none        | 190.0 |  64   |   16.31      |
24.92    |
64k,virtio,on ,none        | 546.0 | 161   |  111.06      |
8.54    |
64k,virtio,on ,O_DSYNC     | 520.0 | 151   |  116.66      |
8.97    |
---------------------------+-------+-------+--------------
+------------+
1M ,virtio,off,none        | 182.0 |  32   | 5573.44      |
407.21    |
1M ,virtio,on ,none        | 750.0 | 127   | 1344.65      |
96.42    |
1M ,virtio,on ,O_DSYNC     | 768.0 | 123   | 1289.05      |
94.25    |
---------------------------+-------+-------+--------------
+------------+

OK, but in this case the size of the cache for "cache=off" is the size of the guest cache whereas in the other cases the size of the cache is
the size of the guest cache + the size of the host cache, this is not
fair...

it isn't supposed to be fair, cache=off is O_DIRECT, we're reading from
the device, we *want* to be able to lean on the host cache to read the
data, pay once and benefit in other guests if possible.

OK, but if you want to follow this way I think you must run several guests concurrently to see how the host cache help each of them. If you want I can try this tomorrow ? The O_DSYNC patch is the one posted to the mailing-list ?

And moreover, you should run an endurance test to see how the cache evolves.




--------------------------------------------------------------------------
exporting file in ext3 filesystem as block device (1g)
--------------------------------------------------------------------------

kvm write (1g dataset):
---------------------------+-------+-------+--------------
+------------+
Test scenarios             | bandw | % CPU | ave submit   | ave
compl  |
block size,iface,cache,sync| MB/s  | usage | latency usec | latency
ms |
---------------------------+-------+-------+--------------
+------------+
16k,virtio,off,none        |  12.1 |  15   |    9.1       |
8.71    |
16k,virtio,on ,none        | 192.0 |  52   |   62.52      |
6.17    |
16k,virtio,on ,O_DSYNC     | 142.0 |  59   |   18.81      |
8.29    |
---------------------------+-------+-------+--------------
+------------+
64k,virtio,off,none        |  15.5 |   8   |   21.10      |
311.00    |
64k,virtio,on ,none        | 454.0 | 130   |  113.25      |
10.65    |
64k,virtio,on ,O_DSYNC     | 154.0 |  48   |   20.25      |
30.75    |
---------------------------+-------+-------+--------------
+------------+
1M ,virtio,off,none        |  24.7 |   5   | 41736.22     |
3020.08    |
1M ,virtio,on ,none        | 485.0 | 100   |  2052.09     |
149.81    |
1M ,virtio,on ,O_DSYNC     | 161.0 |  42   |  6268.84     |
453.84    |
---------------------------+-------+-------+--------------
+------------+

What file type do you use (qcow2, raw ?).

Raw.

No comment

Laurent
----------------------- Laurent Vivier ----------------------
"The best way to predict the future is to invent it."
- Alan Kay









reply via email to

[Prev in Thread] Current Thread [Next in Thread]