rdiff-backup-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[rdiff-backup-users] Re: How are moved/renamed files treated?


From: Jens Benecke
Subject: [rdiff-backup-users] Re: How are moved/renamed files treated?
Date: Fri, 08 Aug 2003 16:25:58 +0200
User-agent: KNode/0.7.6

Ben Escoto wrote:

>>>>>> "JB" == Jens Benecke <[work]" <address@hidden>>
>>>>>> wrote the following on Wed, 06 Aug 2003 15:39:23 +0200
> 
>   JB> I am doing this because in the home directories and also in
>   JB> /var/log, there are a lot of files that get renamed daily (like
>   JB> file.dat from yesterday becomes file.1.dat, file.1.dat becomes
>   JB> file.2.dat, etc) but the contents stay the same. Thus, I (expect
>   JB> to) benefit from rsync's ability to detect identical parts in
>   JB> files, even if not at the same place.
> 
>   JB> How does rdiff-backup treat such files?
> 
> Sorry, rdiff-backup is too dumb to know that a file has moved.  To
> rdiff-backup, moving file A to B is equivalent to creating a new file
> B and deleting A, so increments will contain duplicate information.
> There have been some proposals about tracking renames through inode
> numbers or similar file names, but they are pretty complicated and I
> don't intend to implement them.

I would suggest tracking changed files via MD5 sums. inode numbers are only
applicable to moved files (not copied ones), and only applicable where the
file system supports inodes.

> That being said, log files compress well, so you may want to try it
> anyway in case the duplication is tolerable.

It is not... that's why I asked. The files are 100M-1000M in size, compress
to maybe 10 or 50M, but the change within them is only maybe 10k-100k. And
there are hundreds of those files.


I am currently tar'ing the files together and using rsync on the single .tar
file. I keep three to four generations. Additionally I do (seperate) daily
incremental backups, these are only the full monthly ones.

Can I use rdiff-backup for this _one_ tar file to not have lots of duplicate
data on the local backup server? Currently it looks like this (excerpt):

insgesamt 26835416
-rw-------    1 jens     4060907520 2003-04-09 03:49 RESCUE_2003-04-10.tar
-rw-------    1 jens     4300564480 2003-05-12 03:56 RESCUE_2003-05-12.tar
-rw-------    1 jens     5541826560 2003-06-25 03:58 RESCUE_2003-06-25.tar
-rw-------    2 jens     6777825280 2003-08-04 04:11 RESCUE_2003-08-04.tar
-rw-------    2 jens     6777825280 2003-08-04 04:11 RESCUE.tar

I'd say 95% of the data in the .tar files is identical.

Can I do rdiff-backup ssh://myserver/home/.RESCUE.tar
/home/backups/RESCUE.tar, and the difference between the (newer) remote
.BACKUP.tar and the local BACKUP.tar get saved to a local RESCUE.rdiff (or
something), so I can (locally) recreate the newest RESCUE.tar when I need
it?

Thank you!

-- 
Jens Benecke (address@hidden)





reply via email to

[Prev in Thread] Current Thread [Next in Thread]