rdiff-backup-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [rdiff-backup-users] Tar replacement - format proposal


From: John Goerzen
Subject: Re: [rdiff-backup-users] Tar replacement - format proposal
Date: Fri, 26 Sep 2003 09:14:01 -0500
User-agent: Mutt/1.5.4i

On Fri, Sep 26, 2003 at 02:50:00PM +0100, Kevin Spicer wrote:
> > However, even for tape, the central directory at the end of the file could
> > be great.  Most tape drives can wind to a specific block far faster than
> > they can read through the entirety of a file.  Even given the time lost for
> > reading the central directory and the seeks necessary to do that, it would,
> > in many cases, turn out far faster.
> 
> Thats a good point, how would the drive know where to find the index
> though?  I'm guessing that you can skip to an EOF mark them seek back x
> blocks from there, but how to know how many blocks the index uses...

Well, Ben's proposal uses the same mechanism as PKZip -- the very last n
bytes in the file (where n is defined by the spec and never changes) contain
a pointer to the offset where the central directory starts.  So, your
algorithm for tape would be:

1. Skip to the EOF mark
2. Wind back one block and read that block.

   (If you know in advance how many blocks the file takes, you could wind
    directly to this block)

3. Look at the last n bytes and calculate the block in which the central
   directory starts.  Wind to that block.
4. Read the central directory sequentially.  Determine blocks in which
   each requested file start and sort them in ascending offset order.

For each file:

1. Wind to the block in which it starts if you are not already there
2. Read the file sequentially

Now, steps 1-4 are cumbersome, as it often takes tape drives 5-30 seconds to
switch from winding to reading.  However, even if it takes, say, 2 minutes
to read the central directory, plus 4 minutes to wind to it and another 4
minutes to wind to the start of the file, that's only 10 minutes -- versus 2
or 3 hours to read through the entire archive.

(These are real-world numbers from my own tape drive)

> for the existance of file from an index on disk, rather than having to
> load tapes until you find what you are looking for.

That's an excellent idea as well.

-- John




reply via email to

[Prev in Thread] Current Thread [Next in Thread]