|
From: | Gordan Bobic |
Subject: | Re: [Gluster-devel] Architecture advice |
Date: | Mon, 12 Jan 2009 18:30:49 +0000 |
User-agent: | Thunderbird 2.0.0.19 (X11/20090107) |
Martin Fick wrote:
Not on the client, anyway. But if you're AFR-ing on server side, then your client always talks to one server anyway. The traditional way to handle server failure in that case is to set up Heartbeat or RHCS to fail over the IP address resource to the surviving server. The TCP connection will reset when the fail-over occurs - I'm not sure how gracefully/transparently GlusterFS reconnects.... 1.4 supports an new HA translator that is meant for clients to contact servers that AFR each other. Like this: Client | HA / \ / \ / \ Server A Server B| | AFR AFR| \ / | | \ / || \ / | | X || / \ | | / \ | Vol A Vol BI wasn't aware of there being a HA translator built into GlusterFS, but unless you have proper fencing in place, failing over IP addresses won't work. Without proper cluster fencing in place you can easily find yourself in a split-brain situation where both servers think they have the same IP address and neither can talk to any of the clients.... No need for fencing simply because you now use HA translator. The assumption in this case is that the servers can still talk to each other but that one server's connection to the clientsmay have died.
That means that 50% of the scope for failure will still wipe you out because you'll start splitbraining. Not the way forward at all. A fencing setup will at least preserve the data integrity. The correct way to handle comms channel failure between client and server is to have bonded interfaces going via different physical paths. _ONLY_ dealing with the situation where both servers are alive and connected to each other but we can only reach one due to an obscure failure somewhere in the network (e.g. a failed switch port or a failed NIC in the server) is a pretty half-arsed edge case.
Why re-invent the wheel when the tools to deal with these failure modes already exist?
Any failures on the server side may still warrant a fencing setup, but AFR is not yet setup to work cooperatively with a fencing setup.
It doesn't have to be. If one server in AFR dies nothing spectacular happens. Things time out and carry on. I don't see what cooperation there would need to be. RHCS does it's own heart-beating and fencing. Mix and match as required.
Gordan
[Prev in Thread] | Current Thread | [Next in Thread] |