[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Gluster-devel] Re: New IDEA: The Checksumming xlator ( AFR Translat
From: |
Gareth Bult |
Subject: |
Re: [Gluster-devel] Re: New IDEA: The Checksumming xlator ( AFR Translator have problem ) |
Date: |
Thu, 17 Jan 2008 12:42:19 +0000 (GMT) |
Mmm, my opinion is that it's relatively easy, but needs to be in AFR ...
----- Original Message -----
step 3.: "Angel" <address@hidden>
To: address@hidden
Sent: 17 January 2008 12:17:23 o'clock (GMT) Europe/London
Subject: [Gluster-devel] Re: New IDEA: The Checksumming xlator ( AFR Translator
have problem )
hi
Managing log files seems pretty hard for me at this moment are you confident it
is feasible?
On the other side checksumming also seem very interesting for me as a usable
userspace feature (off loading cheksums from client apps to server)
Definitely checksumming is on my TODO list;
Im very busy now and still have my QUOTA xlator pet in progress..
Anyway is hard to start making logfile AFR without disturbing current AFR
developers.
I sure they should have own ideas about what to do this subject.
Regards,
Life's hard but root password helps!
El Jueves, 17 de Enero de 2008 11:11, Gareth Bult escribió:
> Erm, I said;
>
> >to write the change to a logfile on the remaining volumes
>
> By which I meant that the log file would be written on the remaining
> available server volumes ... (!)
>
> Regards,
> Gareth.
>
> ----- Original Message -----
> step 3.: "Angel" <address@hidden>
> To: address@hidden
> Sent: 17 January 2008 10:07:12 o'clock (GMT) Europe/London
> Subject: Re: [Gluster-devel] Re: New IDEA: The Checksumming xlator ( AFR
> Translator have problem )
>
> The probems is:
>
> If you place AFR on the client, how servers get the log file during recovery
> operations??
>
> Regards, Angel
>
>
> El Jueves, 17 de Enero de 2008 10:44, Gareth Bult escribió:
> > Hi,
> >
> > Yes, I would agree these changes would improve the current implementation.
> >
> > However, a "better" way would be for the client, on failing to write to ONE
> > of the AFR volumes, to write the change to a logfile on the remaining
> > volumes .. then for the recovering server to playback the logfile when it
> > comes back up, or to recopy the file if there are insufficient logs or if
> > the file has been erased.
> >
> > This would "seem" to be a very simple implementation ..
> >
> > Client;
> >
> > Write to AFR
> > If Fail then
> > if log file does not exist create log file
> > Record, file, version, offset, size, data in logfile
> >
> > On server;
> >
> > When recovering;
> >
> > for each entry in logfile
> > if file age > most recent transaction
> > re-copy whole file
> > else
> > replay transaction
> >
> > if all volumes "UP", remove logfile
> >
> > ?????
> >
> > One of the REAL benefits of this is that the file is still available DURING
> > a heal operation.
> > At the moment a HEAL only takes place when a file is being opened, and
> > while the copy is taking place the file blocks ...
> >
> > Gareth.
> >
> > ----- Original Message -----
> > step 3.: "Angel" <address@hidden>
> > To: "Gareth Bult" <address@hidden>
> > Cc: address@hidden
> > Sent: 17 January 2008 08:47:06 o'clock (GMT) Europe/London
> > Subject: New IDEA: The Checksumming xlator ( AFR Translator have problem )
> >
> > Hi Gareth
> >
> > You said it!!, gluster is revolutionary!!
> >
> > AFR does a good job, we only have to help AFR be a better guy!!
> >
> > What we need is a checksumming translator!!
> >
> > Suppouse you have your posix volumes A and B on diferent servers.
> >
> > So your are using AFR(A,B) on client
> >
> > One of your AFRed node fails ( A ) and some time later it goes back to life
> > but its backend filesystem
> > got trashed and fsck'ed and now maybe there subtle differences on the files
> > inside.
> >
> > ¡¡Your beloved 100GB XEN files now dont match on your "fautly" A node and
> > your fresh B node!!
> >
> > AFR would notice this by means (i think) of a xattrS on both files, that's
> > VERSION(FILE on node A) != VERSION(FILE on node B) or anything like that.
> >
> > But the real problem as you pointed out is that AFR only know files dont
> > match, so have to copy every byte from you 100GB image from B to A
> > (automatically on self-heal or on file access )
> >
> > That's many GB's (maybe PB's) going back and forth over the net. THIS IS
> > VERY EXPENSIVE, all we know that.
> >
> > Enter the Checksumming xlator (SHA1 or MD5 maybe md4 as rsync seems to use
> > that with any problem)
> >
> > Checksumming xlator sits a top your posix modules on every node. Whenever
> > you request the xattr SHA1[block_number] on a file the checksumming xlator
> > intercepts this call
> > reads block number "block_number" from the file calculates SHA1 and returns
> > this as xattr pair key:value.
> >
> > Now AFR can request SHA1 blockwise on both servers and update only those
> > blocks that dont match SHA1.
> >
> > With a decent block size we can save a lot of info for every transaction.
> >
> > -- In the case your taulty node lost its contents you have to copy the
> > whole 100GB XEN files again
> > -- In the case SHA1 mismatch AFR can only update diferences saving a lot of
> > resources like RSYNC does.
> >
> > One more avanced feature would be incoproprate xdelta librari functions,
> > making possible generate binary patchs against files...
> >
> > Now we only need someone to implement this xlator :-)
> >
> > Regards
> >
> > El Jueves, 17 de Enero de 2008 01:49, escribió:
> > > Mmm...
> > >
> > > There are a couple of real issues with self heal at the moment that make
> > > it a minefield for the inexperienced.
> > >
> > > Firstly there's the mount bug .. if you have two servers and two clients,
> > > and one AFR, there's a temptation to mount each client against a
> > > different server. Which initially works fine .. right up until one of the
> > > glusterfsd's ends .. when it still works fine. However, when you restart
> > > the failed glusterfsd, one client will erroneously connect to it (or this
> > > is my interpretation of the net effect), regardless of the fact that
> > > self-heal has not taken place .. and because it's out of sync, doing a
> > > "head -c1" on a file you know has changed gets you nowhere. So
> > > essentially you need to remount clients against non-crashed servers
> > > before starting a crashed server .. which is not nice. (this is a filed
> > > bug)
> > >
> > > Then we have us poor XEN users who store 100Gb's worth of XEN images on a
> > > gluster mount .. which means we can live migrate XEN instances between
> > > servers .. which is fantastic. However, after a server config change or a
> > > server crash, it means we need to copy 100Gb between the servers .. which
> > > wouldn't be so bad if we didn't have to stop and start each XEN instance
> > > in order for self heal to register the file as changed .. and while
> > > self-heal is re-copying the images, they can't be used, so you're looking
> > > as 3-4 mins of downtime per instance.
> > >
> > > Apart from that (!) I think gluster is a revolutionary filesystem and
> > > will go a long way .. especially if the bug list shrinks .. ;-)
> > >
> > > Keep up the good work :)
> > >
> > > [incidentally, I now have 3 separate XEN/gluster server stacks, all
> > > running live-migrate - it works!]
> > >
> > > Regards,
> > > Gareth.
> > >
> >
>
--
Don't be shive by the tone of my voice. Just got my new weapon, weapon of
choice...
->>--------------------------------------------------
Angel J. Alvarez Miguel, Sección de Sistemas
Area de Explotación y Seguridad Informática
Servicios Informaticos, Universidad de Alcalá (UAH)
Alcalá de Henares 28871, Madrid ** ESPAÑA **
Tfno: +34 91 885 46 32 Fax: 91 885 51 12
------------------------------------[www.uah.es]-<<--
"No va mas señores..."
_______________________________________________
Gluster-devel mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/gluster-devel