[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [rdiff-backup-users] rdiff-backup performance

From: Marcel (Felix) Giannelia
Subject: Re: [rdiff-backup-users] rdiff-backup performance
Date: Sun, 07 Jun 2009 00:12:53 -0700
User-agent: Thunderbird (X11/20080726)

A remark to the original post: rdiff-backup is optimized with space as its primary concern -- it sacrifices a huge amount of time/speed in order to attain the smallest backup possible. I wish it could be a little friendlier to hard drives (i.e. make them seek less), but all in all I like that goal.

On 05/06/09 02:19, David wrote:
On Fri, Jun 5, 2009 at 10:23 AM, Jakob Unterwurzacher<address@hidden> wrote:
It really depends upon how many files you have. For a server (lots of
files in maildirs etc., won't fit in the dir cache), rdiff-backup is
disk-bound - reading all the files' timestamp just takes lots of time
that cannot be optimized.

Could rdiff-backup at least get a --progress option, similar to rsync,
that gives an indication of how far the processing is? At the moment
the main options seem to be:
I think rdiff-backup is meant to be run in a cron job, i.e. in the background in the middle of the night when there's no one there to see it. The only time a human would be watching rdiff-backup in practice is to debug when there's something wrong, and for that we have the verbose modes (which show exactly what it's doing every step of the way).
1) Run in regular non-verbose mode, and don't see much of anything.
For all we know, it has frozen.
You can watch the destination directory and see the files as they're created. This is most obvious on a first run, but you can watch rdiff-backup-data/increments as well (or tail the file_statistics file if you've got that enabled; personally I don't). You can also use lsof | grep rdiff-backup to see exactly what file it's reading/writing right now. top can show if it's disk-bound (CPU usage will be low and i/o wait time will be really high).

2) Run with more verbosity. The messages aren't very intelligible or
useful for end-users, a lot of the time.

It would be great to get a display, something like this, when the user
runs rdiff-backup with --progress:

- Listing files (1000)... <- This number goes up as the files are listed
- Processing files (500/1000, 50%, eta: 10:13:12) <- Progress updates
as files are processed.
ETA would probably be very difficult to do accurately, since so much of the time lost is due to hard drive seeking, not reading. It could easily take an hour to read 100 MB of data if it's scattered all over the platters in tiny files (e.g. a maildir), and then suddenly process 500 MB in less than a minute (e.g. a MySQL database or a big log file). There's no way to predict the layout of the data the job will encounter in the future: there might be all little files and fragmentation ahead, or it might be big easy files -- no way to know without reading all the directories in advance. In short, 10% of the data might take 90% of the time.

A whacky idea for the developers -- would it be possible to have rdiff-backup learn from the locate database? On systems that have it, s/m/locate already takes the time (usually daily) to enumerate all the files and directories; if rdiff-backup could just read straight from that it might save some time. I don't know if locate stores modified timestamps though.

Could something like the above be added to rdiff-backup? That way we
can get an idea of how fast it's running, when it will complete, or if
it has frozen.


rdiff-backup-users mailing list at address@hidden
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

reply via email to

[Prev in Thread] Current Thread [Next in Thread]