coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[RFC/PATCH] cp: Add option to pre-allocate space for files


From: Mark
Subject: [RFC/PATCH] cp: Add option to pre-allocate space for files
Date: Fri, 11 May 2012 16:03:01 +0100
User-agent: SquirrelMail/1.4.21

Hi,

Here's a patch for cp which adds a new --preallocate option. When
specified, cp allocates disk space for the destination file before writing
data. It uses fallocate() with FALLOC_FL_KEEP_SIZE on Linux, falling back
to posix_fallocate() if that fails.

Benefits of preallocation:
 - Disk fragmentation can be greatly reduced. That means faster file
access and less filesystem overhead (fewer extents).
 - Recovering data after filesystem corruption should be more successful,
since files are more likely to be contiguous.
 - If you're e.g. copying a virtual machine disk image file, the
destination should be (almost) contiguous, meaning that running a disk
optimiser/defragmenter in the guest OS would work as it should (i.e.
improve performance).

This is a very preliminary patch for testing. Hopefully someone will find
it useful. And hopefully someone who (a) has a clue when it comes to C
programming, and (b) is familiar with the coreutils source (I'm neither)
can work from this to produce something which could be included in a
future release.

Note that posix_fallocate() sets the destination file size. If your system
doesn't support fallocate() with FALLOC_FL_KEEP_SIZE, you can't e.g. do
"ls -l destfilename" to monitor the progress of a large file copy; the
length shown will always be the final length.

Pre-allocating space can defeat the object of --sparse=always (or the
default sparse-checking heuristic). If copying files with large holes you
probably won't want to use --preallocate. If you do, regions in the
destination corresponding to holes in the source will be allocated but
unwritten. You'll lose the disk-space-saving benefit, but keep the
fast-reading-of-holes benefit. On the other hand, that feature could be
useful sometimes.

In the general case of copying non-sparse files, it should be beneficial
to use --preallocate. However on some systems, when the destination
filesystem does not support pre-allocation (e.g. FAT32), the
implementation of posix_fallocate() might try to fill the region to be
pre-allocated with zeros. That would double copy time for no benefit.

To-do list:
 - Add --preallocate option to mv as well
 - Should the option name be changed to --pre-allocate?
 - Maybe have an option to tell cp to pre-allocate space for all
destination files in one go, rather than pre-allocating space for each
individual file before copying?
 - Check the error code that fallocate() returns. If it says the
filesystem does not support fallocate(), don't call it again for every
other file being copied.
 - Better handling of sparse files, e.g. don't call fallocate() if source
file is sparse and --sparse=always is given.
 - If pre-allocation fails due to insufficient disk space, cp prints a
message and continues. So typically it will fill up the disk then abort
with an out-of-disk-space error. It would be nice to be able to tell cp
to abort when a pre-allocation fails, so it can exit without wasting
time.

The attached patch is based on coreutils 8.17.


-- Mark

Attachment: preallocate_patch.txt
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]