gluster-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] Re; Load balancing ...


From: Gareth Bult
Subject: Re: [Gluster-devel] Re; Load balancing ...
Date: Mon, 28 Apr 2008 10:37:41 +0100 (BST)

Hi,

>Gordon is right here. Selfhealing on the fly is very much dependant on
>lookup(). So it is inevitable to do lookup() on all the subvols. Also
>we use the results of lookup() call for subsequent operations on that
>file/directory. But it is not a bad idea to compromise consistency for
>speed (with read-subvolume option) as some users might prefer that. We
>can provide this as an option and let admins handle the
>inconsistancies that would arise of this compromise. We shall keep
>this in the TODO list.

Sounds good to me ... :)

If it were a 10% speed difference, I wouldn't even mention it.
But when it's potentially 30x it's a serious issue.

Regards,
Gareth.

On Sat, Apr 26, 2008 at 3:51 AM, Gareth Bult <address@hidden> wrote:
> >You're expecting a bit much here - for any shared/clustered FS. DRBD
>  >might come close if your extents are big enough, but that's a whole
>  >different ball game...
>
>  I was quoting a real-world / live data scenario, DRBD handles it just fine.
>  .. but it is a different mechanism to gluster.
>
>
>  >Sounds like a reasonably sane solution to me.
>
>  It is. It also makes Gluster useless in this scenario.
>
>
>  >Why would the cluster effectively be down? Other nodes would still be
>  >able to server that file.
>
>  Nope, it won't replicate the file while another node has it locked .. which 
> means you effectively need to close all files in order to kick off the 
> replication process, and the OPEN call will not complete until the file has 
> replicated .. so effectively (a) you need to restart all your processes to 
> make then close and re-open their files (or HUP them.. or whatever), then 
> those processes will all freeze until the files they are trying to open have 
> replicated.
>
>
>  >Or are you talking about the client-side AFR?
>
>  Mmm, it's been a while, I'm not entirely sure I've tested the issue on 
> client side and server side.
>  Are you telling me that server-side will work quite happily and it's only 
> client-side that has all these issues?
>
>
>  >I have to say, a one-client/multiple-servers scenario sounds odd.
>  >If you don't care about downtime (you have just one client node so that's
>  >the only conclusion that can be reached), then what's the problem with a 
> bit more downtime?
>
>  My live scenario was 4 (2x2) AFR servers with ~ 12 clients.
>
>  Obviously this setup is no longer available to me as it proved to be useless 
> in practice.
>
>  I'm currently revisiting Gluster with another "new" requirement (as per my 
> last email) .. currently I'm testing a 2 x server + 1 x client setup with 
> regards to load balancing and use over a slow line. Obviously (!) both 
> servers can also act as clients so I guess to be pedantic you'd call it 2 
> servers + 3 clients. My point was I have 1 machine with no server.
>
>
>  Gareth.
>
>  --
>  Managing Director, Encryptec Limited
>  Tel: 0845 5082719, Mob: 0785 3305393
>  Email: address@hidden
>  Statements made are at all times subject to Encryptec's Terms and Conditions 
> of Business, which are available upon request.
>
>
> ----- Original Message -----
>  From: "Gordan Bobic" <address@hidden>
>  To: address@hidden
>
> Sent: Friday, April 25, 2008 9:40:00 PM GMT +00:00 GMT Britain, Ireland, 
> Portugal
>  Subject: Re: [Gluster-devel] Re; Load balancing ...
>
>  Gareth Bult wrote:
>
>
> >> If you have two nodes and the 20 GB file
>  >> only got written to node A while node B was down and
>  >> node B comes up the whole 20 GB is resynced to node B;
>  >> is that more network usage than if the 20 GB file were
>  >> written immediately to node A & node B.
>  >
>  > Ah. Let's say you have both nodes running with a 20Gb file synced.
>  > Then you have to restart one glusterfs on one of the nodes.
>  > While it's down, let's say the other node appends 1 byte to the file.
>  > When it comes back up and looks a the file, the other node will see it's 
> out of date and re-copy the entire 20Gb.
>
>  You're expecting a bit much here - for any shared/clustered FS. DRBD
>  might come close if your extents are big enough, but that's a whole
>  different ball game...
>
>  >> Perhaps the issue is really that the cost comes at an
>  >> unexpected time, on node startup instead of when the
>  >> file was originally written?  Would a startup
>  >> throttling mechanism help here on resyncs?
>  >
>  > Yes, unfortunately you can't open a file while it's syncing .. so when you 
> reboot your gluster server, downtime is the length of time it takes to 
> restart glusterfs (or the machine, either way..) PLUS the amount of time it 
> takes to recopy every file that was written to while one node was down ...
>
>  Sounds like a reasonably sane solution to me.
>
>  > Take a Xen server for example serving disk images off a gluster partition.
>  > 10 Images at 10G each gives you a 100G copy to do.
>
>  If they are static images why would they have changed? What you are
>  describing would really be much better accomplished with a SAN+GFS or
>  Coda which is specifically designed to handle disconnected operation at
>  the expense of other things.
>
>  > Wait, it get's better .. it will only re-sync the file on opening, so you 
> actually have to close all the files, then try to re-open them , then wait 
> while it re-syncs the data (during this time your cluster is effectively 
> down), then the file open completes and you are back up again.
>
>  Why would the cluster effectively be down? Other nodes would still be
>  able to server that file. Or are you talking about the client-side AFR?
>  I have to say, a one-client/multiple-servers scenario sounds odd. If you
>  don't care about downtime (you have just one client node so that's the
>  only conclusion that can be reached), then what's the problem with a bit
>  more downtime?
>
>  > Yet there is a claim in the FAQ that there is no single point of failure 
> .. yet to upgrade gluster for example you effectively need to shut down the 
> entire cluster in order to get all files to re-sync ...
>
>  Wire protocol incompatibilities are, indeed unfortunate. But on one hand
>  you speak of manual failover and SPOF clients and on the other you speak
>  of unwanted downtime. If this bothers you, have enough nodes that you
>  could shut down half (leaving half running), upgrade the downed ones,
>  bring them up and migrade the IPs (heartbeat, RHCS, etc) to the upgraded
>  ones and upgrade the remaining nodes. The downtime should be seconds at
>  most.
>
>  > Effectively storing anything like a large file on AFR is pretty unworkable 
> and makes split-brian issues pale into insignificance ... or at least that's 
> my experience of trying to use it...
>
>  I can't help but think that you're trying to use the wrong tool for the
>  job here. A SAN/GFS solution sounds like it would fit your use case better.
>
>
>
> Gordan
>
>
>  _______________________________________________
>  Gluster-devel mailing list
>  address@hidden
>  http://lists.nongnu.org/mailman/listinfo/gluster-devel
>
>
>  _______________________________________________
>  Gluster-devel mailing list
>  address@hidden
>  http://lists.nongnu.org/mailman/listinfo/gluster-devel
>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]