[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#73784: [PATCH] cp: new option --nocache-source
From: |
Pádraig Brady |
Subject: |
bug#73784: [PATCH] cp: new option --nocache-source |
Date: |
Sun, 13 Oct 2024 15:59:27 +0100 |
User-agent: |
Mozilla Thunderbird Beta |
On 13/10/2024 05:56, Masatake YAMATO wrote:
When copying files, the system data cache are consumed, the system
data cache is utilized for both the source and destination files. In
scenarios such as creating backup files for old, unused files, it is
clear to users that these files will not be needed in the near
future. In such cases, retaining the data for these files in the cache
constitutes a waste of computer resources, especially when running
applications that require significant memory in the foreground.
With the new option, users will have the ability to request the
discarding of the system data cache, thereby avoiding the unwanted
swapping out of data from foreground processes.
I evaluated cache consumption using a script called
run.bash. Initially, run.bash creates many small files, each 8 KB in
size. It then copies these files using the cp command, both with and
without the specified option. Finally, it reports the difference in
the total size of the caches before and after the copying process.
run.bash:
#!/bin/bash
CP=$1
shift
[[ -e "$CP" ]] || {
echo "no file found: $CP" 1>&2
exit 1
}
N=8
S=drop-src
D=${HOME}/drop-dst
mkdir -p $S
mkdir -p $D
start=
end=
print_cached()
{
grep ^Cached: /proc/meminfo
}
start()
{
start=$(print_cached | awk '{print $2}')
}
end()
{
end=$(print_cached | awk '{print $2}')
}
report()
{
echo -n "delta[$N:$1/$2]: "
expr "$end" - "$start"
}
cleanup()
{
local i
local j
for ((i = 0; i < 10; i++)); do
for ((j = 0; j < 10; j++)); do
rm -f $S/F-${i}${j}*
rm -f $D/F-${i}${j}*
done
done
rm -f $S/F-*
rm -f $D/F-*
}
prep()
{
local i
for ((i = 0; i < 1024 * $N; i++ )); do
if ! dd if=/dev/zero of=$S/F-$i bs=4096 count=2 \
status=none; then
echo "failed in dd of=$S/F-$F" 1>&2
exit 1
fi
done
sync
}
run_cp()
{
start
local i
time for ((i = 0; i < 1024 * $N; i++ )); do
if ! "${CP}" "$@" "$S/F-$i" "$D/F-$i"; then
echo "failed in cp " "$@" "$S/F-$i" " $D/F-$i" 1>&2
exit 1
fi
done
end
report "$1" $2
}
cleanup
sync
prep
run_cp "$@"
running:
~/coreutils/nocache$ ./run.bash ../src/cp
real 0m16.051s
user 0m4.249s
sys 0m12.437s
delta[8:/]: 65548
~/coreutils/nocache$ ./run.bash ../src/cp --nocache-source
real 0m17.109s
user 0m4.492s
sys 0m13.317s
delta[8:--nocache-source/]: 620
--nocache-source option suppresses the consumption of the cache
massively.
Thanks for the patch.
I have some reservations/notes though...
There is nothing particularly special about cp, that it might need this option.
I.e. it would be nice to be able to wrap any program so that it streamed
data through the cache, rather than aggressively cached. I'm not sure how to
do that,
but also I'd be reluctant to start adding such options to individual commands
though.
Perhaps Linux' open() may gain an O_STREAM flag in future that might be
more generally applied with a wrapper or something.
For single (large) files, one already has this functionality in dd.
On the write side, you'd also have to worry about syncing, to make the
drop cache advisory effective, and this could impact performance.
Might this drop caches for already cached files,
which cp may just happen to be copying,
thus potentially impacting performance for other programs.
If reflinking we probably would not want to do this operation,
since we're not reading the source.
thanks,
Pádraig