[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 1/4] block: add enable_write_cache flag

From: Christoph Hellwig
Subject: Re: [Qemu-devel] [PATCH 1/4] block: add enable_write_cache flag
Date: Tue, 1 Sep 2009 01:06:12 +0200
User-agent: Mutt/1.3.28i

On Mon, Aug 31, 2009 at 11:46:45PM +0100, Jamie Lokier wrote:
> > On Mon, Aug 31, 2009 at 11:09:50PM +0100, Jamie Lokier wrote:
> > > Right now, on a Linux host O_SYNC is unsafe with hardware that has a
> > > volatile write cache.  That might not be changed, but if it is than
> > > performance with cache=writethrough will plummet (due to issuing a
> > > CACHE FLUSH to the hardware after every write), while performance with
> > > cache=writeback will be reasonable.
> > 
> > Currenly all modes are more or less unsafe with volatile write caches
> > at least when using ext3 or raw block device accesses.  XFS is safe
> > two thirds due to doing the right thing and one third due to sheer
> > luck.
> Right, but now you've made it worse.  By not calling fdatasync at all,
> you've reduced the integrity.  Previously it would reach the drive's
> cache, and take whatever (short) time it took to reach the platter.
> Now you're leaving data in the host cache which can stay for much
> longer, and is vulnerable to host kernel crashes.

Your last comment is for data=writeback, which in Avi's proposal that
I implemented would indeed lost any guarantees and be for all pratical
matters unsafe.  It's not true for any of the other options.

> Oh, and QEMU could call whatever "hdparm -F" does when using raw block
> devices ;-)

Actually for ide/scsi implementing cache control is on my todo list.
Not sure about virtio yet.

> Well I'd like to start by pointing out your patch introduces a
> regression in the combination cache=writeback with emulated SCSI,
> because it effectively removes the fdatasync calls in that case :-)

Yes, you already pointed this out above.

> It goes to show no matter how hard we try, data integrity is a
> slippery thing where getting it wrong does not show up under normal
> circumstances, only during catastrophic system failures.

Honestly, it should not.  Digging through all this was a bit of work,
but I was extremly how carelessly most people that touched it before
were.  It's not rocket sciense and can be tested quite easily using
various tools - qemu beeing the easiest nowdays but scsi_debug or
an instrumented iscsi target would do the same thing.

> It failed with fsync, which
> is also important to applications, but filesystem integrity is the
> most important thing and it's been good at that for many years.

Users might disagree with that.  With my user hat on I couldn't care
less on what state the internal metadata is as long as I get back at
my data which the OS has guaranteed me to reach the disk after a
successfull fsync/fdatasync/O_SYNC write.

> > E.g. if you want to move your old SCO Unix box into a VM it's the
> > only safe option.
> I agree, and for that reason, cache=writethrough or cache=none are the
> only reasonable defaults.

despite the extremly misleading name cache=none is _NOT_ an alternative,
unless we make it open the image using O_DIRECT|O_SYNC.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]