[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[RFC/PATCH] cp: Add option to pre-allocate space for files
From: |
Mark |
Subject: |
[RFC/PATCH] cp: Add option to pre-allocate space for files |
Date: |
Fri, 11 May 2012 16:03:01 +0100 |
User-agent: |
SquirrelMail/1.4.21 |
Hi,
Here's a patch for cp which adds a new --preallocate option. When
specified, cp allocates disk space for the destination file before writing
data. It uses fallocate() with FALLOC_FL_KEEP_SIZE on Linux, falling back
to posix_fallocate() if that fails.
Benefits of preallocation:
- Disk fragmentation can be greatly reduced. That means faster file
access and less filesystem overhead (fewer extents).
- Recovering data after filesystem corruption should be more successful,
since files are more likely to be contiguous.
- If you're e.g. copying a virtual machine disk image file, the
destination should be (almost) contiguous, meaning that running a disk
optimiser/defragmenter in the guest OS would work as it should (i.e.
improve performance).
This is a very preliminary patch for testing. Hopefully someone will find
it useful. And hopefully someone who (a) has a clue when it comes to C
programming, and (b) is familiar with the coreutils source (I'm neither)
can work from this to produce something which could be included in a
future release.
Note that posix_fallocate() sets the destination file size. If your system
doesn't support fallocate() with FALLOC_FL_KEEP_SIZE, you can't e.g. do
"ls -l destfilename" to monitor the progress of a large file copy; the
length shown will always be the final length.
Pre-allocating space can defeat the object of --sparse=always (or the
default sparse-checking heuristic). If copying files with large holes you
probably won't want to use --preallocate. If you do, regions in the
destination corresponding to holes in the source will be allocated but
unwritten. You'll lose the disk-space-saving benefit, but keep the
fast-reading-of-holes benefit. On the other hand, that feature could be
useful sometimes.
In the general case of copying non-sparse files, it should be beneficial
to use --preallocate. However on some systems, when the destination
filesystem does not support pre-allocation (e.g. FAT32), the
implementation of posix_fallocate() might try to fill the region to be
pre-allocated with zeros. That would double copy time for no benefit.
To-do list:
- Add --preallocate option to mv as well
- Should the option name be changed to --pre-allocate?
- Maybe have an option to tell cp to pre-allocate space for all
destination files in one go, rather than pre-allocating space for each
individual file before copying?
- Check the error code that fallocate() returns. If it says the
filesystem does not support fallocate(), don't call it again for every
other file being copied.
- Better handling of sparse files, e.g. don't call fallocate() if source
file is sparse and --sparse=always is given.
- If pre-allocation fails due to insufficient disk space, cp prints a
message and continues. So typically it will fill up the disk then abort
with an out-of-disk-space error. It would be nice to be able to tell cp
to abort when a pre-allocation fails, so it can exit without wasting
time.
The attached patch is based on coreutils 8.17.
-- Mark
preallocate_patch.txt
Description: Text document
- [RFC/PATCH] cp: Add option to pre-allocate space for files,
Mark <=