rdiff-backup-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Fwd: Re: [rdiff-backup-users] Tar replacement - format proposal]


From: Kevin Spicer
Subject: [Fwd: Re: [rdiff-backup-users] Tar replacement - format proposal]
Date: 28 Sep 2003 11:12:22 +0100

On Sun, 2003-09-28 at 04:23, John Goerzen wrote:
> Simple XML can be generated very easily.  Parsing is not quite so simple,
> because already you have to consider all sorts of different quoting
> situations, etc.
This is very true, I recently had to write a parser in perl for a very
simple XML file (I couldn't use perl's XML module because I needed it to
work on Netware, where the xml module isn't available).  It ended up as
a significant portion of my program because of the flexible structure of
XML (I had to allow for attributes appearing in different orders and so
on).  Admittedly you can get away with a lot of this if you control the
generation too.

> 
> I personally use Amanda, so there's a little trick in there involving
> skipping the Amanda header, but that just requires a simple dd.
> 
I'm glad you mentioned Amanda, I also use Amanda and I have run into
what I consider its main limitation (bear with me, this is relevant). 
Amanda uses tar for backups (it also uses dump, but dump can't cope with
the situation I am about to describe).  Where a backup would span more
than one tape it is necessary to manually exclude portions of the source
filesystem to create two or more tar archives, each of which is small
enough to fit on a tape.  There are a number of reasons why Amanda
requires this (even though gnu tar supports multi-volume archives).  I
suspect a major one is that compression of tar archives happens after
the archive is produced, so Amanda has no control over the eventual
archive size  (another issue is the writing of the amanda header).
This lead me to thinking that it is important that the archive format
can cope with multi-volume archives.  With the compression on the
inside, the size of the archive could be completely controlled, which
would be a significant improvement over tar.  There are all sorts of
implementation details that would need to be ironed out, but at this
stage a few issues relating to the design of the format.

Would each multi-volume archive be self contained (with its own header
and index), or would it just be a portion of the full archive snipped in
the appropriate place).  I'd suggest that each portion needs its own
header (with an indication that it is part of a multi-volume set - for
the second and subsequent files at least).  I'd suggest that an index
for each volume would be better, purely because its makes each volume
self contained (an issue if the file is to be mountable).  On the other
hand a full index at the end of the last volume would enable one to
identify which volume a file is in (volume number should form part of
the index reference).  In this case it should also be noted that you may
end up sometimes with the final volume only containing the index, this
means the format should specify zero or more data blocks rather than one
or more. 

To support multi-volume sets it will also be necessary to support the
splitting of large files across archives (for example trying to fit a 4G
file on a CDR isn't going to happen), it may also be necessary to
indicate that a file is split in the index.







BMRB International 
http://www.bmrb.co.uk
+44 (0)20 8566 5000
_________________________________________________________________
This message (and any attachment) is intended only for the 
recipient and may contain confidential and/or privileged 
material.  If you have received this in error, please contact the 
sender and delete this message immediately.  Disclosure, copying 
or other action taken in respect of this email or in 
reliance on it is prohibited.  BMRB International Limited 
accepts no liability in relation to any personal emails, or 
content of any email which does not directly relate to our 
business.






reply via email to

[Prev in Thread] Current Thread [Next in Thread]