[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [Qemu-block] [PATCH for-2.6 2/2] block/gluster: prevent

From: Raghavendra Gowdappa
Subject: Re: [Qemu-devel] [Qemu-block] [PATCH for-2.6 2/2] block/gluster: prevent data loss after i/o error
Date: Mon, 11 Apr 2016 00:26:50 -0400 (EDT)

> +Raghavendra G who implemented this option in write-behind, to this
> upstream patch review discussion

Thanks Pranith. Kritika did inform us about the discussion. We were working on 
ways to find out solutions to the problems raised (and it was a long festive 
weekend in Bangalore).

Sorry for top-posting. I am trying to address two issues raised in this mail 
1. No ways to identify whether setting of option succeeded in gfapi.
2. Why retaining cache after fsync failure is not default behavior?

1. No ways to identify whether setting of option succeeded in gfapi:

I've added Poornima and Raghavendra Talur who work on gfapi to assist on this.

2. Why retaining cache after fsync failure is not default behavior?

It was mostly to not to break two synchronized applications, which rely on 
fsync failures to retry. Details of discussion can be found below. The other 
reason was we could not figure out what POSIX's take on the state of earlier 
write after fsync failure (Other filesystems xfs, ext4 had non-uniform 
behavior). The question more specifically was "is it correct for a write issued 
before a failed fsync to succeed on the backend storage (persisting happened 
after fsync failure)?". I've also added Vijay Bellur who was involved in the 
earlier discussion to cc list.

Following is the discussion we had earlier with Kevin, Jeff Cody and others on 
the same topic. I am quoting it verbatim below.


> > > > > I would actually argue that this setting would be right for everyone,
> > > > > not just qemu. Can you think of a case where keeping the data cached
> > > > > after a failed fsync would actively hurt any application? I can only
> > > > > think of cases where data is unnecessarily lost if data is dropped.
> > > > > 
> > > > 
> > > > I worry about use cases with concurrent writers to the same file and
> > > > how different applications would handle fsync() failures with our new
> > > > behavior.
> > > 
> > > Any specific scenario you're worried about?
> > > 
> > > > Keeping the known old behavior as the default will ensure that we do
> > > > not break anything once this is out. qemu/virt users with gluster are
> > > > accustomed to setting the virt group and hence no additional knobs
> > > > would need to be altered by them.
> > > 
> > > Not changing anything is a safe way to avoid regressions. But it's also
> > > a safe way to leave bugs unfixed. I would say that the current behavĂ­our
> > > is at least borderline buggy and very hard for applications to handle
> > > correctly.
> > 
> > I tend to agree with Kevin on this. If we've an error handling strategy
> > that
> > is posix-complaint, I don't think there is need to add one more option.
> > Most
> > of the times options tend to be used in default values, which is equivalent
> > to not providing an option at all. However, before doing that its better we
> > think through whether it can affect any existing deployments adversely
> > (even
> > when they are not posix-complaint).
> > 
> One pattern that I can think of -
> Applications that operate on the same file from different clients through
> some locking or other co-ordination would have this pattern:
> lock (file), write (file), fsync (file), unlock (file);
> Now if the first fsync() fails, based on application logic the offset used
> for the failed write + fsync could be re-utilized by a co-ordinating
> application on another node to write out legitimate data. When control
> returns back to the application that received a failure, the subsequent
> write + fsync can cause data to be overwritten at the old offset along with
> new data being written at the new offset.

Yeah. I agree. Co-ordinated applications on different mounts will have issues, 
if they are working on the assumption that after fsync no older writes will hit 
the backend. Given that there seems to be a fair bit of confusion on status of 
retry of older writes after an fsync failure, we can expect some regressions. 
So, to summarize,

1. Keep the behaviour in patchset 11 of [1] as default. Both fsync and flush 
act as barriers and will make sure either
   a. older writes are synced to backend
   b. old writes are failed and never retried.

   after a failed fsync/flush.

2. Modify the behaviour of patchset 11 of [1] to keep failed syncs after a 
failed fsync and retry them till a flush. After a flush, no retries of failed 
syncs will be attempted. This behaviour will be introduced as an option.

3. Transient and non-transient errors will be treated similarly and failed 
syncs will be retried alike.

Does everyone agree on the above points? If yes, I'll modify [1] accordingly.

[1] http://review.gluster.org/#/c/12594/


reply via email to

[Prev in Thread] Current Thread [Next in Thread]