[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: sparse file processing improvements
Re: sparse file processing improvements
Tue, 07 Oct 2014 03:50:39 +0200
Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.1.0
On 10/06/2014 04:40 PM, Pádraig Brady wrote:
I've attached 3 patches to improve various sparse file handling.
1. supports detecting holes < internal buffer size (currently 128KiB)
2. employs the punch hole scheme demonstrated above
3. reads sparse files efficiently even if not writing to a regular file
cool work, thanks!
At a first glance, the changes in copy.c seem to be correct.
Without having a deeper look yet, the new inner loop to detect
holes in the just-read buffer seems to do the lseek() twice if
a hole would span adjacent, outer-loop read()s.
The ls(1) output of the result in the new test seems to second
+ ls -lsh sparse.in sparse.out sparse.out2
512K -rw-r--r-- 1 voelkerb users 512K Oct 6 20:03 sparse.in
260K -rw-r--r-- 1 voelkerb users 512K Oct 6 20:03 sparse.out
256K -rw-r--r-- 1 voelkerb users 512K Oct 6 20:03 sparse.out2
This means the copying could be even more effective in the first
> diff --git a/NEWS b/NEWS
> index 1811ae4..785773f 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -30,6 +30,9 @@ GNU coreutils NEWS -*-
> ** Improvements
> + cp will convert smaller runs of NULs in the input to holes,
> + to reduce allocation in the copy.
Aren't ginstall(1) and mv(1) also somehow affected? If so, then
the NEWS should mention this.
> diff --git a/tests/cp/sparse.sh b/tests/cp/sparse.sh
> index d6cc4c4..1414d35 100755
> --- a/tests/cp/sparse.sh
> +++ b/tests/cp/sparse.sh
> @@ -37,4 +37,32 @@ test $(stat --printf %b copy) -le $(stat --printf %b
sparse) || fail=1
> cp --sparse=always --reflink sparse copy && fail=1
> cp --sparse=never --reflink sparse copy && fail=1
> +# Ensure we handle sparse/non-sparse transitions correctly
> +maxn=128 # how many $hole_size chunks per file
> +hole_size=$(stat -c %o copy)
> +dd if=/dev/zero bs=$hole_size count=$maxn of=zeros
> +tr '\0' '\1' < zeros > ones
Test framework failures during the above commands (and below)
should be catched.
> +for n in 1 2 3 4 32 $maxn; do
> + parts=$(expr $maxn / $n)
> + rm -f sparse.in
> + # Generate sparse file for copying with alternating
> + # hole/data patterns of size n * $hole_size
> + for i in $(yes zeros | sed 1~2s/zeros/ones/ | head -n$parts); do
> + dd iflag=fullblock if=$i of=sparse.in conv=notrunc oflag=append \
> + bs=$hole_size count=$n status=none || framework_failure_
> + done
Looking at the ls(1) output above, the generated sparse.in files are
not sparse, are they? I think it doesn't matter, and in fact the file
to copy first does not have to be sparse. This is what the second
copying below is for. So just s/sparse/sample/ in the comment maybe.
Instead of appending to the sample file with "of=sparse.in conv=notrunc"
" oflag=append", you could just redirect the for-loop's output to that
I'm not sure if the sed command "1~2s" is available everywhere
(TBH I've seen it the first time myself).
What about avoiding the sed(1) call completely by moving the
alternating logic into the yes(1) call?
s="$(printf "%s\n%s" zeros ones)"
for i in $(yes "$s" | head -n$parts); do
> + cp --sparse=always sparse.in sparse.out || fail=1 # non sparse input
> + cp --sparse=always sparse.out sparse.out2 || fail=1 # sparse input
> + cmp sparse.in sparse.out || fail=1
> + cmp sparse.in sparse.out2 || fail=1
> + ls -lsh sparse.*
Given the sparse-detection and -punching is optimal, then both
output file should always have the same size, shouldn't they?
Therefore, the test should check those numbers.
> Exit $fail
Although the test data seems to be quite good, I somehow
have the feeling that it should be even more random.
In the above, the blocks of NULs and 1s are always a multiple
of $holesize ... which in turn makes it all somehow related
to the buffer size and read size in sparse_copy().
I think we should use varying block sizes around the block size
for the NULs and 1s blocks to be closer to real-world data.
Thanks & have a nice day,