[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: rm -rf: avoid ext3/4 O(N^2) performance hit, further librarify remov
From: |
Eric Blake |
Subject: |
Re: rm -rf: avoid ext3/4 O(N^2) performance hit, further librarify remove.c |
Date: |
Thu, 25 Sep 2008 23:25:46 +0000 (UTC) |
User-agent: |
Loom/3.14 (http://gmane.org/) |
Jim Meyering <jim <at> meyering.net> writes:
> i.e., this change makes rm process dir entry names in sorted
> inode order, when that's sensible.
What about 'ls' - should that also be changed to stat in inode order when -f is
not enabled? (An interesting thought, since it means ls would then be sorting
twice per directory in order to run faster than with a single sort!)
> ** New features
>
> + chgrp, chmod, chown, chcon, du, rm: now all display linear performance,
Is this really true, given that sorting is at best log-linear (and with qsort,
possibly quadratic)? True, log-linear is noticeably better than quadratic, but
have you proven that the linear time spent stat'ing the sorted list outweighs
the log-linear cost of sorting enough to discard the log-linear term? I guess
that explains the FIXME comment about detecting tmpfs to avoid the sort.
> + INODE_SORT_DIR_ENTRIES_THRESHOLD = 10000
And you might want to play with lowering this threshold once you convert to a
different sort algorithm, in case the time of qsort was impacting the timing
impacts you used when selecting this threshold.
> +
> +/* Return an approximation of the maximum number of dirent entries
> + in a directory with stat data *ST. */
> +static size_t
> +dirent_count (struct stat const *st)
> +{
> + return st->st_size / 16;
Hmm. On cygwin, directories show st_size of 0 (it's too time-consuming to
compute a real value). Is this approximation going to be valid on every file
system where sorting benefits? Or should we add a special case for a st_size
of zero to request sorting, on the worst-case assumption that the directory is
big even though the OS won't tell us how big?
> +static void
> +preprocess_dir (DIR **dirp, struct rm_options const *x)
> +{
> +#if HAVE_STRUCT_DIRENT_D_TYPE
Phooey - cygwin currently lacks d_type. So I guess the bulk of this patch
won't benefit cygwin anyway, whether or not NTFS is also a file system where
sorting would help (although I don't know if cygwin's emulation of inodes atop
NTFS will still result in something where sorted order will help or hinder disk
seek times).
And shouldn't this condition also check for the presence d_ino, since otherwise
the D_INO below makes every inode 0 and the qsort unstable?
> + /* - the directory is smaller than some threshold.
> + Estimate the number of inodes with a heuristic.
> + There's no significant benefit to sorting if there are
> + too few inodes. This */
Dangling comment.
> + /* Read all entries, storing <d_ino, d_name> for each non-dir one.
> + Maintain a parallel list of pointers into the primary buffer. */
> + while (1)
> + {
> + struct dirent const *dp;
> + dp = readdir_ignoring_dot_and_dotdot (*dirp);
> + /* no need to distinguish EOF from failure */
> + if (dp == NULL)
> + break;
Would it be worth using scandir to do the argument collection and sort, rather
than writing your own loop? There's already the proposal of porting scandir to
gnulib.
> + /* Iterate through those pointers, unlinking each name. */
> + size_t i;
> + for (i = 0; i < n; i++)
Since you've already gone for C99, should you compress these two lines?
for (size_t i = 0;...
> + /* Ensure each of these is NULL, in case init/allocation
> + fails and we end up calling ds_free on all three while only
> + one or two has been initialized. */
> + for (i = 0; i < 3; i++)
> + o[i]->chunk = NULL;
Does the obstack API allow you to safely reference this field without going
through an accessor macro/function?
--
Eric Blake
- rm -rf: avoid ext3/4 O(N^2) performance hit, further librarify remove.c, Jim Meyering, 2008/09/25
- Re: rm -rf: avoid ext3/4 O(N^2) performance hit, further librarify remove.c,
Eric Blake <=
- Re: rm -rf: avoid ext3/4 O(N^2) performance hit, further librarify remove.c, Ralf Wildenhues, 2008/09/26
- Re: rm -rf: avoid ext3/4 O(N^2) performance hit, further librarify remove.c, Andreas Schwab, 2008/09/26
- Re: rm -rf: avoid ext3/4 O(N^2) performance hit, further librarify remove.c, Jim Meyering, 2008/09/26