[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 'ls' speedup when sorting lots of file names

From: Jim Meyering
Subject: Re: 'ls' speedup when sorting lots of file names
Date: Mon, 29 Jan 2007 12:24:53 +0100

Paul Eggert <address@hidden> wrote:
> This patch causes "ls" to run about 50% faster on my
> platform.  I'm running Debian stable x86, with file info
> cached in main memory, and with an en_US.UTF-8 locale.  I'm
> testing this by using the command "ls bigdir >outfile",
> where "bigdir" is a file containing 60895 files with names
> "0", "1", "2", ..., "60894".

I see the same with the "C" locale -- when all inode data is already
cached, of course.

When I timed old and new versions of ls on a directory with 67K entries
(same names as yours), I got similar numbers on an otherwise idle system.
8 samples:

  old: avg  .34s, stddev .06.
  new: avg  .18s, stddev .08.

With 300,000 files and 35 trials, there's an even larger performance
  old: avg  .59s, stddev .10.
  new: avg 1.58s, stddev .09.

With a directory containing 1,000,000 files, 10 timings, the
difference is even more striking:

  old: avg 6.50s, stddev .90.
  new: avg 1.52s, stddev .055

The old one consistently got ~100k minor page faults,
while the new version got only half as many at ~55k.

> The downside of the patch is that it uses a bit more main
> memory (valgrind reports 5.8% more malloced storage, using
> the above test case), but I think it's a worthwhile
> tradeoff.

The new version seems to exert much less memory pressure than the old
one in the million-file case, so I see no down side.  These timings were
using an AMD 64 3400+ with 2GB of relatively slow RAM.

I'll bet that a lot of the savings is in decreased data movement.
It's far easier to swap pointers than entire "fileinfo" structs.
That struct weighs in at 184 bytes on my system.
I've just reordered its members to shave off 8 bytes.
The new size: 176.

We could probably save even more space by going to bitfields, but
to justify that (and show that using narrower types doesn't hurt
performance), I'd have to spend some time profiling.  It's probably
no longer worthwhile.

2007-01-29  Jim Meyering  <address@hidden>

        Shave 8 bytes off the size of "struct fileinfo".
        * src/ls.c (fileinfo): Put all members of type "bool" together.

diff --git a/src/ls.c b/src/ls.c
index 6e610c4..fa9a2fa 100644
--- a/src/ls.c
+++ b/src/ls.c
@@ -156,22 +156,23 @@ struct fileinfo
     /* The file name.  */
     char *name;

-    struct stat stat;
-    bool stat_ok;
     /* For symbolic link, name of the file linked to, otherwise zero.  */
     char *linkname;

+    struct stat stat;
+    enum filetype filetype;
     /* For symbolic link and long listing, st_mode of file linked to, otherwise
        zero.  */
     mode_t linkmode;

+    bool stat_ok;
     /* For symbolic link and color printing, true if linked-to file
        exists, otherwise false.  */
     bool linkok;

-    enum filetype filetype;
 #if USE_ACL
     /* For long listings, true if the file has an access control list.  */
     bool have_acl;

reply via email to

[Prev in Thread] Current Thread [Next in Thread]