rdiff-backup-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [rdiff-backup-users] Tar replacement - format proposal


From: John Goerzen
Subject: Re: [rdiff-backup-users] Tar replacement - format proposal
Date: Fri, 26 Sep 2003 08:19:34 -0500
User-agent: Mutt/1.4i

On Fri, Sep 26, 2003 at 10:33:01AM +0100, Kevin Spicer wrote:
> > to sketch out a possible format.  Again comments and suggestions are
> > greatly appreciated.  
> 
> Interesting ideas.  You seem very focused on backup to disk, have you
> considered what happens if someone wants to backup to tape.  Since the
> index is at the end of the file they would need to read the entire file
> to get the index, then rewind and read (possibily) the entire file again
> to extract what they want.  I appeciate your reasons for not putting the

I don't see why this is true.  Presumably the per-file data embedded within
the archive would contain all the information that a tape restore program
would need, so it could just read the archive sequentially like tar.

However, even for tape, the central directory at the end of the file could
be great.  Most tape drives can wind to a specific block far faster than
they can read through the entirety of a file.  Even given the time lost for
reading the central directory and the seeks necessary to do that, it would,
in many cases, turn out far faster.

> index at the start, however this could be a serious issue for some.  One
> possible solution (in the implementation) is to allow the user to
> specify that the archive index should be in a second file, so that you
> can seek past the first (file) archive to read the index then rewind and
> read the file.  Thinking on this, the first file should contain the
> index at the end as normal (so that every duplicity archive is self
> contained - so you don't end up in a position where you have the archive
> but not the index), then the option allows a copy of the index to be
> stored as well.  You could even allow the option of making copies of the
> index to alternative locations.  For example this would allow the bulky
> archives to be stored in offline storage and the smaller indexes to be
> kept on disk - which would allow files to be located before retrieving
> the appropriate tape from storage.
> 
> There would be some benefit in including in the header information to
> indicate whether a full archive or index and some unique identifier so
> that indexes and archives can be related.
> 
> You talk about whether header entries can be a fixed size or whether you
> should use xml. I think it would be a good idea to use xml whenever you
> can, to permit extensibility.  You're finding limitations of tar now
> because of decisions just like that.
> 
> Some other, random unstructured thoughts... [disclaimer - I've not
> actually used duplicity, although I have read the docs I may have
> missed/ misunderstood some of its existing features]
> Presumably you'll be compressing prior to encrypting, IIRC you get
> better compression ratios that way?  Will you be supporting other
> compressions scheme (like bzip2) & alternative encryption algorithms? 
> What about signed byt unencrypted archives (for those who are only
> worried about making sure the backup has not been changed/corrupted?
> Will individual blocks be signed, or just the full archive - this could
> be important since it may permit undamaged portions of a damaged archive
> to be restored, on the other hand this would add to size.
> 
> Still thinking on my feet, its not clear from your page (not to me
> anyway) whether the metadata is stored at the block level or in the
> index at the end.  I would suggest that the block level is better.  In
> fact I think the index should require no information that is absolutely
> necessary to restore the whole file (although obviously its useful in
> selecting individual files) because...
>   * Should a file become truncated (maybe out of space on device or
> whatever) undamaged blocks could still be recovered.
>   * It would then be possible to read/write an archive from/to a stream
> (like tar, gzip, bzip2 do).
> 
> [I've just reread the page and now I think this is what you are
> proposing, but I'm not 100% sure]
> 
> Depending on which stage you implement compression at you may like to
> think about having a customisable block size, different settings to gzip
> and bzip2 use different block sizes (bzip2 is much bigger IIRC), so if
> your intention is to... 1) Build Block 2) Compress Block  3)
> encrypt/sign block ... Then you might think about matching your inner
> block size to that of the compression algorithm in use, to optimise the
> compression you get.  I don't know about the impact of block size on
> encryption, anyone care to enlighten me?  It output files may be written
> to tape theres also an implication there for the block size of output
> (i.e.not the inner block size).
> 
> Final, off the wall thought, recent WIndows filesystems (just NTFS?)
> have the capability of having multiple streams associated with a simgle
> filename (although this isn't being used by anybody very much yet AFAIK)
> I'm not sure how you would go about handling these, but if you didn't
> already know about them they are there.  Just thought I'd mention it (in
> case its something that needs addressing in the file format, rather than
> just the implementation).
> 
> 
> 
> 
> BMRB International 
> http://www.bmrb.co.uk
> +44 (0)20 8566 5000
> _________________________________________________________________
> This message (and any attachment) is intended only for the 
> recipient and may contain confidential and/or privileged 
> material.  If you have received this in error, please contact the 
> sender and delete this message immediately.  Disclosure, copying 
> or other action taken in respect of this email or in 
> reliance on it is prohibited.  BMRB International Limited 
> accepts no liability in relation to any personal emails, or 
> content of any email which does not directly relate to our 
> business.
> 
> 
> 
> 
> _______________________________________________
> rdiff-backup-users mailing list
> address@hidden
> http://mail.nongnu.org/mailman/listinfo/rdiff-backup-users




reply via email to

[Prev in Thread] Current Thread [Next in Thread]