[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Gluster-devel] bit rot support for glusterfs
From: |
shishir gowda |
Subject: |
Re: [Gluster-devel] bit rot support for glusterfs |
Date: |
Thu, 9 Jan 2014 16:12:49 +0530 |
On 2 January 2014 19:30, Jeffrey Darcy <address@hidden> wrote:
>> I will be starting to work on bit rot detection for glusterfs.
>
> That's magnificent, and not only because it's a valuable feature. It's great
> to see you still exhibiting such initiative. :)
Leaving gluster developement was never my intention. Yes, due to my
other commitments, the development/response time might be slower. But
in the end, being able to develop in the community does give me
freedom of choice :)
>
>> 1. Depend on change-log to recompute checksum. This eliminates
>> periodic crawl of brick/volume to update the checksum.
>
> Absolutely. This was always my biggest objection to Doug's design. Crawling
> simply doesn't scale well. Even for local replication we're moving to a more
> log-based approach. The design you cite also mentions AFR-specific artifacts
> like outcast, which need to be avoided both as a matter of general practice
> and because those artifacts might no longer be relevant a year from now.
>
Agreed. Bit rot will only take into account checksum based of a single
brick/child.
>> 2. Policy to determine when checksum to recomputed. If a file is under
>> going active I/O, then compute checksum only after a delay
>
> This might play into what we are (or at least I am) planning with respect to
> tiering, HSM, data classification, or whatever you want to call it. In fact,
> it might be something that you don't need to worry about at all. If a
> volume/pool is divided into a "live" part geared toward high performance and
> an "archival" part geared toward longevity and storage efficiency - slower
> drives, erasure codes and/or bit rot detection instead of AFR/NSR - with
> transparent migration between them, then you might just be able to say that
> anything placed under your purview will have bit rot detection applied. To
> put it another way, the policy would be applied elsewhere in the system.
>
Makes sense to allow policy to be decided else where, as in most use
cases bit-rot would be enabled only for archival stores. So the last
fd close on a file should lead to computation of a checksum on the
bricks. Does change-log xlator have the capability to identify the
last fd close on a file?
>> 3. Ability to turn off/on bit rot detection in volumes.
>>
>> 4. If bit rot is turned on for a volume, a crawl would be necessary in
>> this case to compute checksum.