[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: stat() order performance issues
From: |
Phillip Susi |
Subject: |
Re: stat() order performance issues |
Date: |
Fri, 26 Jan 2007 15:42:54 -0500 |
User-agent: |
Thunderbird 1.5.0.9 (Windows/20061207) |
Jim Meyering wrote:
That's good, but libc version matters too.
And the kernel version. Here, I have linux-2.6.18 and
Debian/unstable's libc-2.3.6.
How does the kernel or libc version matter at all? What matters is the
on disk filesystem layout and how it is not optimized for fetching stat
information on files in what is essentially a random order, instead of
inode order. In the case of ext2/3, the inodes are stored on disk in
numerical order, and for reiserfs, they tend to be stored in order, but
don't have to be. On ext2/3 I believe file names are stored in the
order they were created in, and on reiserfs, they are stored in order of
their hash. In both cases the ordering of inodes and the ordering of
names returned from readdir are essentially randomly related.
Anyhow, I am running kernel 2.6.15 and libc 2.3.6.
10-15 minutes is very bad.
Something needs an upgrade.
Or a bugfix/enhancement, unless there already is a newer version of
coreutils that stats in inode order. My version of coreutils is 5.93.
I presume you used xargs -- you wouldn't run stat 114K times...
Yes....
ls -Ui > files
cat files | sort -g | cut -c 9- > files-sorted
cat files | cut -c 9- > files-unsorted
time cat files-unsorted | xargs stat > /dev/null
< clear cache >
time cat files-sorted | xargs stat > /dev/null
Sorting by inode number made the stats at least 10 times faster.