Re: [Gluster-devel] solutions for split brain situation

gluster-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] solutions for split brain situation

From:	Mark Mielke
Subject:	Re: [Gluster-devel] solutions for split brain situation
Date:	Thu, 17 Sep 2009 20:27:13 -0400
User-agent:	Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.1) Gecko/20090814 Fedora/3.0-2.6.b3.fc11 Thunderbird/3.0b3

On 09/17/2009 06:47 PM, Stephan von Krawczynski wrote:

Way above in this discussion I told that we only talk about the first/primary
subvolume/backend for simplicity. It makes no sense to check a journal if I
can stat the real file which I have to do anyway if an open/create arrives -
and we are talking exactly about that. So please explain where is your assumed
race? Really only a braindead implementation can race on an open. You can
delay a flush on close (like writebehind), but you can obviously not delay an
open neither r,rw nor create because you have to know if the file is a)
existing and b) can be created if not. As long as you don't touch the backend
you will not find out if a create may fail for disk-full or the like. It may
as well fail because of access-privileges. whatever it is, you will not find a
trusted answer without asking the backend, no journal will save you.

Like most backend storages, the backend storage includes the data pages,the metadata, AND the journal. "Without asking the backend" and "nojournal will save you" are not not understanding that the backend*includes* the journal.

A scenario which should make this clear: Let's say the file a.c isremoved a from a 2-node replication cluster. Something like thefollowing should occur: Step 1 is to lock the resource. Step 2 is torecord the intent to remove on each node. Step 3 is to remove on eachnode. Step 4 is to clear the intent from each node. Step 5 is to unlockthe resource. Now, let's say that one node is not accessible during thisprocess and it comes back up later. After it comes back up, should aprocess that happens to see the file does not exist on node 1, but doesexist on node 2. Should the file exist or not? I don't know if GlusterFSeven does this correctly - but if it does, the file should NOT exist.There should be sufficient information, probably in the journal, to showthat the file was *removed*, and therefore, even if one node still hasthe file, the journal tells us that the file was removed. The self-healoperation should remove the file from the node that was down as soon asthe discrepancy is detected.

The point here, is that the journal SHOULD be consulted. If you thinkotherwise, I think you are not looking for a reliable replicationcluster that implements POSIX guarantees.

I think GlusterFS doesn't provide all of these guarantees as well as itshould, but I have not done the full testing to expose how correct orincorrect it is in various cases. As it is, I just received a problemwhere a Java program trying to use file locking failed in a GlusterFSmount point, but succeeded in /var/tmp, so although I still thinkGlusterFS has potentially - I'm slowly backing down from what productiondata I am willing to store in it. It's unfortunate that this solutionspace seems so immature. I'm still switching back and forth betweenwondering if I should push / help GlusterFS into solving all of theproblems, or just write my own solution.

My favourite solution is a mostly asynchronous master-master approach,where each node can fall out of date from the other, as long as theytouch different data, but that changes that do touch the same databecome serialized. Unfortunately, this also requires the most cleverimplementation strategy as well, and clever can take time or exceptionaltalent.

Read again: I said "and not going over glusterfs for some unknown reason."
"unkown reason" means that I can think of some for myself but tend to believe
there may be lots of others. My personal reason nr 1 is the soft migration
situation.

See my comment about writing a program to set up the xattr metadata for you

How about using the code that is there - inside glusterfsd.
It must be there, else you would not be able to mount an already populated
backend for the first time. Did you try? I did.

This could mean that GlusterFS is too lax with regard to consistencyguarantees. If files can appear in the background, and magically beshown - this indicates that GlusterFS is not enforcing use through themount point, which introduces the potential for inconsistent or faultyresults. You are asking for it to guess what you want, without seeingthat what you are asking for is incompatible with provisions for anyguarantee of a consistent view. That "it works" is actually moreconcerning to me that justifying over your position. To me it says it'sone more potential problem that I might hit in the future. A file thatshould be removed magically re-appears - how is this a good thing?


Cheers,
mark

--
Mark Mielke<address@hidden>

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Gluster-devel] solutions for split brain situation, (continued)
- Re: [Gluster-devel] solutions for split brain situation, Mark Mielke, 2009/09/14
  - Re: [Gluster-devel] solutions for split brain situation, Stephan von Krawczynski, 2009/09/14
- RE: [Gluster-devel] solutions for split brain situation, Gordan Bobic, 2009/09/16
  - Re: [Gluster-devel] solutions for split brain situation, Mark Mielke, 2009/09/16
    - Re: [Gluster-devel] solutions for split brain situation, Joe Landman, 2009/09/16
    - Re: [Gluster-devel] solutions for split brain situation, Gordan Bobic, 2009/09/16
- RE: [Gluster-devel] solutions for split brain situation, Gordan Bobic, 2009/09/16
  - Re: [Gluster-devel] solutions for split brain situation, Stephan von Krawczynski, 2009/09/17
- RE: [Gluster-devel] solutions for split brain situation, Gordan Bobic, 2009/09/17
  - Re: [Gluster-devel] solutions for split brain situation, Stephan von Krawczynski, 2009/09/17
    - Re: [Gluster-devel] solutions for split brain situation, Mark Mielke <=
    - Re: [Gluster-devel] solutions for split brain situation, Anand Avati, 2009/09/17
    - Re: [Gluster-devel] solutions for split brain situation, Michael Cassaniti, 2009/09/17
    - Re: [Gluster-devel] solutions for split brain situation, Mark Mielke, 2009/09/18
    - Re: [Gluster-devel] solutions for split brain situation, Anand Avati, 2009/09/18
    - Re: [Gluster-devel] solutions for split brain situation, Mark Mielke, 2009/09/18
    - Re: [Gluster-devel] solutions for split brain situation, Anand Avati, 2009/09/18
    - Re: [Gluster-devel] solutions for split brain situation, Stephan von Krawczynski, 2009/09/18
    - Re: [Gluster-devel] solutions for split brain situation, Stephan von Krawczynski, 2009/09/18
    - Re: [Gluster-devel] solutions for split brain situation, Mark Mielke, 2009/09/18
    - Re: [Gluster-devel] solutions for split brain situation, Stephan von Krawczynski, 2009/09/18

Prev by Date: Re: [Gluster-devel] solutions for split brain situation
Next by Date: Re: [Gluster-devel] solutions for split brain situation
Previous by thread: Re: [Gluster-devel] solutions for split brain situation
Next by thread: Re: [Gluster-devel] solutions for split brain situation
Index(es):
- Date
- Thread