qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] bdrv_aio_flush


From: Jamie Lokier
Subject: Re: [Qemu-devel] [PATCH] bdrv_aio_flush
Date: Tue, 2 Sep 2008 19:22:15 +0100
User-agent: Mutt/1.5.13 (2006-08-11)

Ian Jackson wrote:
> Jens Axboe writes ("Re: [Qemu-devel] [PATCH] bdrv_aio_flush"):
> > On Tue, Sep 02 2008, Ian Jackson wrote:
> > > This is still not perfect because we unnecessarily flush some data
> > > thus delaying reporting completion of the WRITE FUA.  But there is at
> > > at least no need to wait for _other_ writes to complete.
> > 
> > I don't see how the above works. There's no dependency on FUA and
> > non-FUA writes, in fact FUA writes tend to jump the device queue due to
> > certain other operating systems using it for conditions where that is
> > appropriate. So unless you do all writes using FUA, there's no way
> > around a flush for committing dirty data. Unfortunately we don't have a
> > FLUSH_RANGE command, it's just a big sledge hammer.
> 
> Yes, certainly you do aio_sync _some_ data that doesn't need to be.
> Without an O_FSYNC flag on aio_write that's almost inevitable.

Btw, in principle for FUA writes you can set O_SYNC or O_DSYNC on the
file descriptor just for this operation.  Either using fcntl() (but
I'm not sure I believe that would be portable and really work), or
using two file descriptors.

> But if bdrv_aio_fsync also does a flush first then you're going to
> sync _even more_ unnecessarily: the difference between `bdrv_aio_fsync
> does flush first' and `bdrv_aio_fsync does not flush' only affects
> writes are queued but not completed when bdrv_aio_fsync is called.
> 
> That is, non-FUA writes which were submitted after the FUA write.
> There is no need to fsync these and that's what I think qemu should
> do.

I agree, that's a clever reason to make bdrv_aio_fsync() guarantee
less rather than more.

(Who knows, that might be the reason SuS doesn't offer a stronger
guarantee too, although I doubt it - if that was serious they might
have defined a more selective sync instead.)

It would be interesting to see if using aio_fsync(O_DSYNC) were slower
or faster than fdatasync() on a range of hosts - just in case the
former syncs previously submitted AIOs and the latter doesn't.

Btw, on Linux aio_fsync(O_DSYNC) does the equivalent of fsync(), not
fdatasync().  This is because Glibc defines O_DSYNC to be the same as
O_SYNC.  To get fdatasync(), you have to use the Linux-AIO API and
IOCB_CMD_FDSYNC.

> Andrea was making some comments about scsi and virtio.  It's possible
> that these have different intended semantics and perhaps those device
> models (in hw/*) need to call flush explicitly before sync.

Or perhaps they would benefit from an async equivalent, so they don't
have to pause and can queue more requests?

-- Jamie




reply via email to

[Prev in Thread] Current Thread [Next in Thread]