qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] bdrv_aio_flush


From: Ian Jackson
Subject: Re: [Qemu-devel] [PATCH] bdrv_aio_flush
Date: Tue, 2 Sep 2008 11:46:38 +0100

Jamie Lokier writes ("Re: [Qemu-devel] [PATCH] bdrv_aio_flush"):
> Andrea thinks bdrv_aio_flush does guarantee that in flight operations
> are flushed, while bdrv_flush definitely does not (fsync doesn't).

I read Andrea as complaining that bdrv_aio_flush _should_ flush in
flight operations but does not.

> I vaguely recall from the discussion before, there was uncertainty
> about whether that is true, and therefore the right thing to do was
> wait for the in flight AIOs to complete _first_ and then issue an
> fsync or aio_fsync call.

Whether bdrv_aio_flush should do this is a question of the qemu
internal API.

> The Open Group text for aio_fsync says: "shall asynchronously force
> all I/O operations [...]  queued at the time of the call to aio_fsync
> [...]".

We discussed the meaning of the unix specs before.  I did a close
textual analysis of it here:

  http://lists.nongnu.org/archive/html/qemu-devel/2008-04/msg00046.html

> Since then, I've read the Open Group specifications more closely, and
> some other OS man pages, and they are consistent that _writes_ always
> occur in the order they are submitted to aio_write.

Can you give me a chapter and verse quote for that ?  I'm looking at
SuSv3 eg
  http://www.opengroup.org/onlinepubs/009695399/nfindex.html

All I can see is this:
  If O_APPEND is set for the file descriptor, write operations append
  to the file in the same order as the calls were made.

If things are as you say and aio writes are always executed in the
order queued, there would be no need to specify that because it would
be implied by the behaviour of write(2) and O_APPEND.

> What _can_ be queued is a WRITE FUA command: meaning write some data
> and flush _this_ data to non-volatile storage.

In order to implement that interface without flushing other data
unecessarily, we need to be able to

   submit other IO requests
   submit aio_write request for WRITE FUA
   asynchronously await completion of the aio_write for WRITE FUA
   submit and perhaps collect completion of other IO requests
   collect completion of aio_write for WRITE FUA
   submit and perhaps collect completion of other IO requests
   submit aio_fsync (for WRITE FUA)
   submit and perhaps collect completion of other IO requests
   collect aio_fsync (for WRITE FUA)

This is still not perfect because we unnecessarily flush some data
thus delaying reporting completion of the WRITE FUA.  But there is at
at least no need to wait for _other_ writes to complete.

So the semantics of bdrv_aio_flush should be `flush (at least) writes
which have already completed'.

Ian.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]