[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#10281: change in behavior of du with multiple arguments (commit

From: Jim Meyering
Subject: bug#10281: change in behavior of du with multiple arguments (commit
Date: Sat, 17 Dec 2011 10:20:09 +0100

Alan Curry wrote:
> By comparison to a proper tool which doesn't do any unnecessary traversals of
> extra directories, your use of du is slow and brittle (if the user forgets
> an alternate directory containing a link, the result is wrong) and has only
> the slight advantage of already being implemented.
> Here's a working outline of the single-traversal method. I wouldn't suggest
> that du should contain equivalent code. A single-purpose perl script, even
> without pretty output formatting, feels clean enough to me. Since I've gone
> to the trouble (not much) of writing it, I'll keep it as ~/bin/predict_rm_rf
> for future use.
> #!/usr/bin/perl -W
> use strict;
> use File::Find;
> @ARGV or die "Usage: $0 directory [directory ...]\n";
> my $total = 0;
> my %pending = ();
> File::Find::find({wanted => sub {
>   my ($dev,$ino,$nlink,$blocks) = (lstat($_))[0,1,3,12];
>   if(-d _ || $nlink==1) {
>     $total += $blocks;
>     return;
>   }
>   if($nlink == ++$pending{"$dev.$ino"}) {
>     delete $pending{"$dev.$ino"};
>     $total += $blocks;
>   }
> }}, @ARGV);
> print "$total blocks would be freed by rm -rf @ARGV\n";

That seems useful.
However, the number it prints is too large whenever it processes
a file or directory more than $nlink times, e.g., when invoked as

    predict_rm_rf F F

it prints double the correct number.

To account for that, the script must record every dev/ino pair
it processes, say via:

    File::Find::find({wanted => sub {
      my ($dev,$ino,$nlink,$blocks) = (lstat($_))[0,1,3,12];
      defined $pending{"$dev.$ino"} && $pending{"$dev.$ino"} < 0
        and return;

      if(-d _ || $nlink==1 || $nlink == ++$pending{"$dev.$ino"}) {
        $total += $blocks;
        $pending{"$dev.$ino"} = -1;
    }}, @ARGV);

Note that for a large tree, the perl code will be far less efficient
than C code like du because:

  - the perl script must call lstat for every single entry (du can
    use dirent.d_ino on some file systems).  When I checked about a year
    ago, Perl still had no good way to get something like dirent.d_ino.
  - du uses a compact representation for a device/inode pair, so
    may use a lot less memory.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]