gluster-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] self-heal behavior


From: Steffen Grunewald
Subject: Re: [Gluster-devel] self-heal behavior
Date: Wed, 4 Jul 2007 16:16:50 +0200
User-agent: Mutt/1.5.13 (2006-08-11)

On Wed, Jul 04, 2007 at 07:33:14PM +0530, Anand Avati wrote:
> Gerry,
> your question is appropriate, but the answer to 'when to resync' is not
> very simple. when a brick which was brought down is brought up later, it may
> be a completely new (empty) brick. In that case starting to sync every file
> would most likely be the wrong decision. (we should rather sync the file
> which the user needs than some unused file). Even if we chose to sync files
> without user accessing them it would be very sluggish too since it would be
> intervening in other operations.

Doesn't this situation compare to RAIDs when a spare disk (hot or cold)
replaces a failed one? Data integrity *demands* to restore missing data 
as fast as possible - the next failure could kill the last valid copy.
(That's why RAID-6 has become so popular: the risk to lose two disks
within a short time span cannot be neglected)

> The current approach is to sync files on the next open() on it. This is
> usually a good balance since, during open() if we were to sync a file, even
> if it was a GB it would take 10-15 secs, and for normal files (in the order
> of few MBs) it is almost not noticable. But if this were to happen together
> for all files whether the user accessed them or not there would be a lot of
> traffic and be very sluggish.
> 
> This approach of syncing on open() is what even other filesystems which
> support redundancy do.

Sounds like the ZFS approach: data is repaired when corruption is detected.
This happens on access (when the metadata layer detects that the block doesn't
match its checksum) *but* there's the opportunity to have a background scrubber.

Probably it's worth to have a (possibly low-prioritized) background thread that
compares the actual local filesystem to the namespace structure, and starts
the necessary repair actions.
This certainly is not client-based though.

> Detecting 'idle time' and beginning sync-up and pausing the sync-up when
> user begins activity is a very tricky job, but that is definitely what we
> aim at finally. It is not enough if AFR detects the client is free, because
> the servers may be busy serving files to another client and syncing at that
> time may not be the most apprpriate time. The following versions of AFR will
> have more options to tune 'when' to sync. Currently it is only at open(). We
> plan to add options to make it sync on lookup() (happens on ls). Later
> versions would have pro-active syncing (detecting that both server and
> clients are idle etc).

Sounds reasonable...

Steffen





reply via email to

[Prev in Thread] Current Thread [Next in Thread]