It does seem like it would be fairly easy to add another
metadata attribute to each file/directory that would hold
a checksum for it. This way, AFR itself could be
configured to check/compute the checksum anytime the file
is read/written. Since this would slow AFR down, I would
suggest a configuration option to turn this on. If the
checksum is wrong, it could heal to the version of the
other brick if the other brick's checksum is correct.
Another alternative would be to create an offline
checksummer that updates such an attribute if it does not
exist, and checks the checksum if it does exist. If when
it checks the checksum it fails, it would simply delete the
file and its attributes (and potentially the directory
attributes up the tree) so that AFR will then heal it.
The only modification needed by AFR to support this
would be to delete the checksum attribute anytime the
file/directory is updated so that the offline checksummer
will recreate it instead of thinking it is corrupt.
In fact, even this could be eliminated so that the
offline checksummer is completely "self-powered",
anytime it calculates a checksum it could copy the
glusterfs version and timestamp attributes to two new
"checksummer" attributes. If these become out of date the
cheksummer will know to recompute the checksum instead of
assuming that the file has been corrupted.
The one risk with this is that if a file gets corrupted
on both nodes, it will get deleted on both nodes so you
will not have a corrupted file to at least look at.
This too could be overcome by saving any deleted files
in a separate "trash can" and cleaning the trash can
once the files in it have been healed, sort of a self cleaning lost+found directory.
I know this may not be the answers that you were
looking for, but I hope it helps clarify things
a little.