bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: stat() order performance issues


From: Jim Meyering
Subject: Re: stat() order performance issues
Date: Fri, 26 Jan 2007 12:35:30 +0100

Phillip Susi <address@hidden> wrote:
> I have noticed that performing commands such as ls ( even with -U ) and

Which ls option(s) are you using?
Which file system?  As you probably know, it really matters.
If it's just "ls -U", then ls may not have to perform a single "stat" call.
If it's "ls -l", then the stat per file is inevitable.
But if it's "ls --inode" or "ls --file-type", with the right file system,
ls gets all it needs via readdir, and can skip all stat calls.  But with
some other file system types, it still has to stat every file.

For example, when I run "ls --file-type" on three maildirs containing
over 160K entries, it's nearly instantaneous.  There are only 3 stat calls:

    $ strace -c ls -1 a b c|wc -l
    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
     88.55    0.025785         600        43           getdents64
     11.40    0.003320         237        14           munmap
      0.04    0.000013           0       233           write
      0.00    0.000000           0        14           read
      0.00    0.000000           0        20           open
      0.00    0.000000           0        26           close
      0.00    0.000000           0         3           stat
      0.00    0.000000           0        21           fstat
      0.00    0.000000           0         5           lseek
      0.00    0.000000           0        40           mmap
      0.00    0.000000           0        10           mprotect
      0.00    0.000000           0        19           brk
      0.00    0.000000           0         2           rt_sigaction
      0.00    0.000000           0         1           rt_sigprocmask
      0.00    0.000000           0         2         2 ioctl
      0.00    0.000000           0        11        11 access
      0.00    0.000000           0         7           mremap
      0.00    0.000000           0         4           socket
      0.00    0.000000           0         4         4 connect
      0.00    0.000000           0         1           execve
      0.00    0.000000           0         1           uname
      0.00    0.000000           0        15           fcntl
      0.00    0.000000           0         1           getrlimit
      0.00    0.000000           0         1           arch_prctl
      0.00    0.000000           0         1           set_tid_address
    ------ ----------- ----------- --------- --------- ----------------
    100.00    0.029118                   499        17 total
    163843

> du in a Maildir with many thousands of small files takes ages to
> complete.  I have investigated and believe this is due to the order in

Yep.  du has to perform the stat calls.

"ages"?  Give us numbers.  Is NFS involved?  A slow disk?
I've just run "du -s" on a directory containing almost 70,000 entries,
and it didn't take *too* long with a cold cache: 21 seconds.
Running the same command again, (hot cache), it took just 2s.
The disk is local (about 2yrs old), but nothing fancy.  The file system
type is reiserfs.

> which the files are stat()ed.  I believe that these utilities are simply
> stat()ing the files in the order that they are returned by readdir(),
> and this causes a lot of random disk reads to fetch the inodes from disk
> out of order.
>
> My initial testing indicates that sorting the files into inode order and
> calling stat() on them in order is around an order of magnitude faster,
> so I would suggest that utilities be modified to behave this way.

Post your patch, so others can try easily.
If sorting entries (when possible, i.e., for du, and some invocations
of ls) before stating them really does result in a 10x speed-up on
"important" systems, then there's a good chance we'll do it.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]