[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Gluster-devel] HA translator questions
From: |
Martin Fick |
Subject: |
Re: [Gluster-devel] HA translator questions |
Date: |
Thu, 1 Jan 2009 11:17:41 -0800 (PST) |
--- On Thu, 1/1/09, Krishna Srinivas <address@hidden> wrote:
> <address@hidden> wrote:
> > --- On Thu, 1/1/09, Krishna Srinivas
> <address@hidden> wrote:
> >>
> >> <address@hidden> wrote:
> > Hmm, I don't see this looping on failure in the
> code, but my understanding of the translator design is
> fairly minimal. I will have to look harder. I was hoping
> to be able to modify the subvolume looping to be able to
> loop back upon itself indefinitely if all the subvolumes
> failed. If this could be done, it seems like this would be
> an easy way to achieve NFS style blocking when the server is
> down (see my other thread on this), by simply using the HA
> translator with only one subvolume.
>
> Just curious, why do you want the application to hang till
> the server comes back up? the indefinite hang is not desirable to most
> users.
Because very few applications are written to recover from intermittent errors.
Once they see an error, they give up. If you picture a bunch of clients
relying on the FS on the server, if the server crashes they will likely all be
hosed. But since the client machines did not crash, they will likely never
recover until someone reboots them. Simply hanging and recovering when the
server comes up is an essential feature for most networked filesystem clients.
> In case of NFS if the NFS server is down, won't the client
> error out saying that server is down?
No, it will hang indefinitely until the server comes up. The clients will
therefor not fail and simply continue along their own business as usual when
the server returns with only a delay, no errors, no application
restarts/reboots required.
> > Also, how about failure due to replies that do not
> > return because the link is down? Are the requests saved
> > after they are sent until the reply arrives so that it can
> > be resent on the other link if the original link
> > successfully sends the request, but goes down afterwards and
> > cannot receive the reply?
>
> > Yes requests are saved so that it can be retried on other
> > subvol if the current subvol goes down during operaion.
Cool, this brings up one last extreme corner case that concerns me with this.
What if client A sends a write request to file foo through HA to subvolume 1
and the link goes down after subvolume 1 services the request but before it can
successfully reply that it has completed the write? In this case you have
confirmed that client A will retry on subvolume 2. If subvolume 1 & 2 share
the same backend, the write to file foo will already have taken place at this
point. This might make it possible for client B to read from file foo and
write something new to it before the HA translator's client A write request to
file foo is resent on subvolume 2. When this resend from client A finally
makes it to subvolume 2, it could then potentially rewrite the original write
from client A on file foo overwriting client B's write which depended on client
A's first write.
Is the scenario above possible? Or would both subvolume 1 & 2 somehow know not
to process client B's write request until they know that client A has received
an ACK for it's original write request and therefor is not going to resend it?
I know that this is somewhat of a far fetched corner case, but if this is
possible, I believe that unfortunately this would be non-posix compliant
behavior. This is the same concern I had with case #3 in my proposed fixes on
my NFS blocking thread. Make sense at all?
I wonder how NFS deals with a similar potential problem? It seems like this
(case #3, not the HA case) might be possible with NFS also unless it keeps
track of all writes that it knows the client hasn't received an ACK to yet, and
does not allow other writes to the same place until then?
Thanks again,
-Martin