[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#29921: O(n^2) performance of rm -r
From: |
Niklas Hambüchen |
Subject: |
bug#29921: O(n^2) performance of rm -r |
Date: |
Sun, 31 Dec 2017 22:07:31 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:58.0) Gecko/20100101 Thunderbird/58.0 |
Hello,
in commit
rm -r: avoid O(n^2) performance for a directory with very many entries
http://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=24412edeaf556a
it says that `rm -r`
"now displays linear performance, even when operating on million-entry
directories on ext3 and ext4"
I found this not to be the case on my systems running ext4 on Linux 4.9
on a WD 4TB spinning disk, coreutils rm 8.28.
As reported on https://bugs.python.org/issue32453#msg309303, I got:
nfiles real user sys
100000 0.51s 0.07s 0.43s
200000 2.46s 0.15s 0.89s
400000 10.78s 0.26s 2.21s
800000 44.72s 0.58s 6.03s
1600000 180.37s 1.06s 10.70s
Each 2x increase of number of files results in 4x increased deletion
time, making for a clean O(n^2) quadratic curve.
I'm testing this with:
set -e
rm -rf dirtest/
echo 100000 && (mkdir dirtest && cd dirtest && seq 1 100000 | xargs
touch) && time rm -r dirtest/
echo 200000 && (mkdir dirtest && cd dirtest && seq 1 200000 | xargs
touch) && time rm -r dirtest/
echo 400000 && (mkdir dirtest && cd dirtest && seq 1 400000 | xargs
touch) && time rm -r dirtest/
echo 800000 && (mkdir dirtest && cd dirtest && seq 1 800000 | xargs
touch) && time rm -r dirtest/
echo 1600000 && (mkdir dirtest && cd dirtest && seq 1 1600000 | xargs
touch) && time rm -r dirtest/
On another system, Ubuntu 16.04, coretuils rm 8.25, with Linux 4.10 on
ext4, I get:
nfiles real user sys
100000 0.94s 0.06s 0.516s
200000 2.94s 0.10s 1.060s
400000 10.88s 0.30s 2.508s
800000 34.60s 0.48s 4.912s
1600000 203.87s 0.99s 11.708s
Also quadratic.
Same machine on XFS:
nfiles real user sys
100000 3.37s 0.04s 0.98s
200000 7.20s 0.06s 2.03s
400000 22.52s 0.16s 5.11s
800000 53.31s 0.37s 11.46s
1600000 200.83s 0.76s 22.41s
Quadratic.
What is the complexity of `rm -r` supposed to be?
Was it really linear in the past as the patch describes, and this is a
regression?
Can we make it work linearly again?
Thanks!
Niklas
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- bug#29921: O(n^2) performance of rm -r,
Niklas Hambüchen <=