Re: [rdiff-backup-users] Proposal: Storing excess file information

rdiff-backup-users

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [rdiff-backup-users] Proposal: Storing excess file information

From:	Dave Steinberg
Subject:	Re: [rdiff-backup-users] Proposal: Storing excess file information
Date:	Sat, 30 Nov 2002 20:40:34 -0500
User-agent:	Microsoft-Entourage/10.0.0.1331

Hey Ben, my reply may be a little bit of the 'from the peanut gallery'
variety, so I hope it provides for some discussion, if anything.  :)

First, I agree with your proposed system.  The KISS principle says it'll
work great!

Second, although it sounds like a great application for XML, sometimes XML
is applied too judiciously.  If you do decide to go the XML route, one bit
of style that may save you more than a few stat()'s and open()'s would be to
have a single metadata file with *all* the metadata in it.  This has a few
advantages/disadvantages immediately:

Adv:
- eliminates stat/open/close of a billion files
- powerful addressing language (XPATH) to pull out data from anywhere in the
tree

Dis:
- uses gobs more memory (w/ DOM, SAX would be more efficient but more
complex)
- adds complexity and relies on external code, which makes security a much
larger concern.

The fact that you plan to gzip/rdiff the files takes away the bloat issue of
xml.

However, if it doesn't make sense to store everything in one file (for
reasons unbeknownst to me) then XML doesn't make sense.  It would add a lot
of overhead for getting what is equally got from a text file - but with tons
of added complexity.

Third, and in another direction, you might consider Dan Bernstein's Constant
Database (cdb).  I don't really know if it makes sense here - I just know
that its very fast and works well - just like the rest of his software.  Its
originally written in C but just about every language has an interface for
it.  More info:

http://cr.yp.to/cdb.html

That page also has a link to the Python interface.

As for storing gzipp'ed rdiffs of a CDB, I am unsure of how to proceed.  I
do have a pointer for hints, though.  I believe Dan Bernstein's Tinydns
package has its database in cdb format.  User tools modify the central text
file to add/delete/modify dns information and then its all processed and
converted to cdb for quick access.

Again, I'm not sure if CDB is a great fit, I just thought I'd bring it up
and let you judge for yourself.

Hope my input helps a little,

-- 
Dave Steinberg

On 11/29/02 7:01 PM, "Ben Escoto" <address@hidden> wrote:

> 
> Hi all, let me run by you a scheme for storing file (meta-)data which
> won't fit natively on the destination file system.  Suggestions
> welcome.
> 
> 
> Problem: Some file information cannot be stored on the destination
> file system because of limitations of the file system, configuration
> differences between it and the source system, security issues, or
> other reasons.
> 
>   For instance, as pointed out by Ilya Konstantinov, it is a big
> limitation that unless rdiff-backup runs as root on the destination
> system, ownership information is lost (because rdiff-backup lacks the
> permissions to change file ownership).  With root access, ownership
> information is preserved, but still can be set incorrectly/confusingly
> on the remote system, if user and group id's don't match.  This will
> be a bigger problem if/when rdiff-backups supports ACLs.  Finally
> sockets, symlinks, fifos, and device files cannot be backed up to many
> file system types.
> 
> 
> Proposed solution:  Every session, rdiff-backup can write extra file
> information to a data file in the rdiff-backup-data directory.  It
> would be a text file, looking like this:
> 
> File bin/view
>   Type sym
>   SymData view
> File bin/zcat
>   NumberLinks 4
>   Inode 2834484
>   Device 771
>   Uname root
>   Gname root
> File dev/ttyS1
>   Type dev
>   DevInfo c 4 65
>   Uname root
>   Gname root
> ...
> 
> each "line" would probably be terminated with a null, in case the
> filenames included newlines.  Most files would not have an entry at
> all, just the ones with data that couldn't fit on the destination
> system.
> 
>   BTW, I thought about doing this in XML, but after spending a few
> hours trying to learn XML (and even going to a bookstore and skimming
> the Python & XML book) I concluded that either Python has bad XML
> support, or, more likely, the dominant XML interfaces like SAX are
> very bad for this kind of thing.  But if you know XML and disagree,
> let me know.
> 
>   Anyway, the file would be gzipped and stored with reversed diffs,
> so it wouldn't take up much space.  If you have lots of hard links
> (like I do) the new system will probably save you space, as currently
> the hard link data is stored in its entirety for each session.
> Although a text file, it shouldn't be that slow, since it is always
> processed in order.  One bad point is that restoring a single file
> could be slow, since the whole file might have to be decompressed and
> scanned.
> 
>   And another side note, there was some discussion of this on the
> rsync list, under the "virtual file system" rubric.  I asked recently,
> but didn't think there was enough enthusiasm to try to use the same
> file format, or anything like that.
> 
>   Last point:  At first it seemed that this could help backing up to
> a case insensitive file system, but now I don't see how, since file
> names collisions could still happen, no matter what extra information
> you had.  So it seems this can't replace the current quoting system.
> 
>   So, any suggestions?  (Or offers to implement immediately? :))
>

[Prev in Thread]

Current Thread

[Next in Thread]

[rdiff-backup-users] Proposal: Storing excess file information, Ben Escoto, 2002/11/29
- Re: [rdiff-backup-users] Proposal: Storing excess file information, Dave Steinberg <=

Prev by Date: [rdiff-backup-users] Proposal: Storing excess file information
Previous by thread: [rdiff-backup-users] Proposal: Storing excess file information
Index(es):
- Date
- Thread