qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] block-queue: Delay and batch metadata writes


From: Anthony Liguori
Subject: Re: [Qemu-devel] [RFC] block-queue: Delay and batch metadata writes
Date: Mon, 20 Sep 2010 10:40:33 -0500
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.12) Gecko/20100826 Lightning/1.0b1 Thunderbird/3.0.7

On 09/20/2010 10:08 AM, Kevin Wolf wrote:
If you're comfortable with a writeback cache for metadata, then you
should also be comfortable with a writeback cache for data in which
case, cache=writeback is the answer.
Well, there is a difference: We don't pollute the host page cache with
guest data and we don't get a virtual "disk cache" as big as the host
RAM, but only a very limited queue of metadata.

Basically, in qemu we have three different types of caching:

1. O_DSYNC, everything is always synced without any explicit request.
    This is cache=writethrough.

I actually think O_DSYNC is the wrong implementation of cache=writethrough. cache=writethrough should behave just like cache=none except that data goes through the page cache.

2. Nothing is ever synced. This is cache=unsafe.

3. We present a writeback disk cache to the guest and the guest needs
    to explicitly flush to gets its data safe on disk. This is
    cache=writeback and cache=none.

We shouldn't tie the virtual disk cache to which cache= option is used in the host. cache=none means that all requests go directly to the disk. cache=writeback means the host acts as a writeback cache.

If your disk is in writethrough mode, exposing cache=none as a writeback disk cache is not correct.

We're still lacking modes for O_DSYNC | O_DIRECT and unsafe | O_DIRECT,
but they are entirely possible, because it's two different dimensions.
(And I think Christoph was planning to actually make it two independent
options)

I don't really think O_DSYNC | O_DIRECT makes much sense.

If it's a matter of batching, batching can't occur if you have a barrier
between steps 3 and 5.  The only way you can get batching is by doing a
writeback cache for the metadata such that you can complete your request
before the metadata is written.

Am I misunderstanding the idea?
No, I think you understand it right, but maybe you were not completely
aware that cache=none doesn't mean writethrough.

No, cache=none means don't cache on the host.

In my mind, cache=none|cache=writethrough is specifically about eliminating the host from the cache hierarchy. This is not a correctness issue with respect to integrity but rather about data loss. If you have strong storage with battery backed caches, then you can relax flushes. However, if you've got a cache in the host and the host isn't battery backed, that's no longer safe to do.

So even with cache=none, if we added a writeback cache for metadata, it would really need to be an optional feature. Something like cache=none|writethrough|metadata|writeback.

Regards,

Anthony Liguori

Kevin




reply via email to

[Prev in Thread] Current Thread [Next in Thread]