bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Patch] Add dirent.d_type support to Cygwin 1.7 ?


From: Jim Meyering
Subject: Re: [Patch] Add dirent.d_type support to Cygwin 1.7 ?
Date: Fri, 28 Nov 2008 20:19:07 +0100

"James Youngman" <address@hidden> wrote:
> On Thu, Nov 27, 2008 at 10:53 PM, Eric Blake <address@hidden> wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> [dropping cygwin-patches, since posts are closed to non-subscribers, and
>> adding bug-findutils and bug-gnulib.  Christian is working on a patch that
>> lets cygwin do initial support of dirent.d_type]
>>
>> According to Christian Franke on 11/27/2008 2:41 PM:
>>>
>>> PS: find is not as smart as expected: 'find /path -type d' calls lstat()
>>> for each entry, even if d_type != DT_UNKNOWN.
>>> So 'find /path' is 2-3 times faster than 'find /path -type d'.
>>
>> This seems like it might be a bug in gnulib's fts implementation.  How
>> does 'oldfind /path -type d' perform?  oldfind has the advantage of not
>> using fts, so if it performs better, then there is a hole where we need to
>> improve gnulib's fts to make directory-only or non-directory-only
>> traversals use d_type for optimization.
>
> I believe that at least a large part of the problem may be the fact
> that some files are stat()ed by both fts (as part of the traversal
> logic) and find.  However, this hasn't been fully investigated.
>
> If someone would be able to investigate and discover the places where
> duplicate *stat() calls are made, this would allow us to close the
> performance gap between oldfind and find.   That and the performance
> gap between find and oldfind with "-execdir ...+" are IIRC the only
> reasons why we still build oldfind at all.
>
> So if someone would be able to investigate, I'd be very grateful.

There was indeed an opportunity for improvement.
find/fts would unnecessarily stat all non-directories with any
type-only predicates -- and assuming that the file system has
dirent.d_type support.

I've fixed it with two changes.
The first in gnulib's fts.c to expose the d_type information,
and the second, in find, to use that newly-available data.

Now, on a file system type e.g., ext3 or ext4, that provide
usable dirent.d_type, a use like "find . -type d" will stat
only the directories.  Before, it would stat everything.

I'll apply the gnulib/fts change after a little more testing.

Jim

>From 01d0e6ccdb65fb83dd783c1f2e1c30ba3a21f639 Mon Sep 17 00:00:00 2001
From: Jim Meyering <address@hidden>
Date: Fri, 28 Nov 2008 18:40:08 +0100
Subject: [PATCH] fts: make dirent.d_type available, when possible

* lib/fts.c (D_TYPE): Define.
(DT_UNKNOWN, DT_BLK, DT_CHR) [HAVE_STRUCT_DIRENT_D_TYPE]: Define.
(DT_DIR, DT_FIFO, DT_LNK, DT_REG, DT_SOCK): Likewise.
(s_ifmt_shift_bits): New function.
(set_stat_type): New function.
(fts_build): When not calling fts_stat, call set_stat_type
to propagate dirent.d_type info to fts_read caller.
---
 ChangeLog |   11 ++++++++
 lib/fts.c |   80 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 91 insertions(+), 0 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index daa90b2..63ab2e6 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,14 @@
+2008-11-28  Jim Meyering  <address@hidden>
+
+       fts: make dirent.d_type available, when possible
+       * lib/fts.c (D_TYPE): Define.
+       (DT_UNKNOWN, DT_BLK, DT_CHR) [HAVE_STRUCT_DIRENT_D_TYPE]: Define.
+       (DT_DIR, DT_FIFO, DT_LNK, DT_REG, DT_SOCK): Likewise.
+       (s_ifmt_shift_bits): New function.
+       (set_stat_type): New function.
+       (fts_build): When not calling fts_stat, call set_stat_type
+       to propagate dirent.d_type info to fts_read caller.
+
 2008-11-20  Bruno Haible  <address@hidden>

        Attempt to work around an AIX 5.3, 6.1 compiler bug with include_next.
diff --git a/lib/fts.c b/lib/fts.c
index 2054864..35def60 100644
--- a/lib/fts.c
+++ b/lib/fts.c
@@ -84,9 +84,31 @@ static char sccsid[] = "@(#)fts.c    8.6 (Berkeley) 8/14/94";
 # define DT_IS_KNOWN(d) ((d)->d_type != DT_UNKNOWN)
 /* True if the type of the directory entry D must be T.  */
 # define DT_MUST_BE(d, t) ((d)->d_type == (t))
+# define D_TYPE(d) ((d)->d_type)
 #else
 # define DT_IS_KNOWN(d) false
 # define DT_MUST_BE(d, t) false
+# define D_TYPE(d) DT_UNKNOWN
+
+# undef DT_UNKNOWN
+# define DT_UNKNOWN 0
+
+/* Any nonzero values will do here, so long as they're distinct.
+   Undef any existing macros out of the way.  */
+# undef DT_BLK
+# undef DT_CHR
+# undef DT_DIR
+# undef DT_FIFO
+# undef DT_LNK
+# undef DT_REG
+# undef DT_SOCK
+# define DT_BLK 1
+# define DT_CHR 2
+# define DT_DIR 3
+# define DT_FIFO 4
+# define DT_LNK 5
+# define DT_REG 6
+# define DT_SOCK 7
 #endif

 enum
@@ -989,6 +1011,61 @@ fts_compare_ino (struct _ftsent const **a, struct _ftsent 
const **b)
          : b[0]->fts_statp->st_ino < a[0]->fts_statp->st_ino ? 1 : 0);
 }

+/* Return the number of bits by which a d_type value must be shifted
+   left in order to put it into the S_IFMT bits of stat.st_mode.  */
+static int
+s_ifmt_shift_bits (void)
+{
+  unsigned int v = S_IFMT; /* usually, 0170000 */
+  static const int MultiplyDeBruijnBitPosition[32] =
+    {
+      0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,
+      31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
+    };
+
+  /* Find the smallest power of two, P (e.g., 0010000) such that P & V == P. */
+  unsigned int p = v ^ (v & (v - 1));
+
+  /* Compute and return r = log2 (p), using code from
+     http://graphics.stanford.edu/~seander/bithacks.html#IntegerLogDeBruijn */
+  return MultiplyDeBruijnBitPosition[(uint32_t) (p * 0x077CB531UL) >> 27];
+}
+
+/* Map the dirent.d_type value, DTYPE, to the corresponding stat.st_mode
+   S_IF* bit and set ST.st_mode, thus clearing all other bits in that field.  
*/
+static void
+set_stat_type (struct stat *st, unsigned int dtype)
+{
+  mode_t type;
+  switch (dtype)
+    {
+    case DT_BLK:
+      type = S_IFBLK;
+      break;
+    case DT_CHR:
+      type = S_IFCHR;
+      break;
+    case DT_DIR:
+      type = S_IFDIR;
+      break;
+    case DT_FIFO:
+      type = S_IFIFO;
+      break;
+    case DT_LNK:
+      type = S_IFLNK;
+      break;
+    case DT_REG:
+      type = S_IFREG;
+      break;
+    case DT_SOCK:
+      type = S_IFSOCK;
+      break;
+    default:
+      type = 0;
+    }
+  st->st_mode = dtype << s_ifmt_shift_bits ();
+}
+
 /*
  * This is the tricky part -- do not casually change *anything* in here.  The
  * idea is to build the linked list of entries that are used by fts_children
@@ -1218,6 +1295,9 @@ mem1:                             saved_errno = errno;
                                          && DT_IS_KNOWN(dp)
                                          && ! DT_MUST_BE(dp, DT_DIR));
                        p->fts_info = FTS_NSOK;
+                       /* Propagate dirent.d_type information back
+                          to caller, when possible.  */
+                       set_stat_type (p->fts_statp, D_TYPE (dp));
                        fts_set_stat_required(p, !skip_stat);
                        is_dir = (ISSET(FTS_PHYSICAL) && ISSET(FTS_NOSTAT)
                                  && DT_MUST_BE(dp, DT_DIR));
--
1.6.0.4.1101.g642f8



>From d7ccb70f88ce912616b2935c3757a7aae9f64d3b Mon Sep 17 00:00:00 2001
From: Jim Meyering <address@hidden>
Date: Fri, 28 Nov 2008 17:21:55 +0100
Subject: [PATCH] find: avoid redundant stat calls when dirent.d_type is usable

When fts provides usable dirent.d_type information, and the only
stat information required by find's predicates the type information
normally found in stat.st_mode, skip the now-redundant stat calls.
This change is useful only with the very latest version of fts.c
from gnulib.
* find/ftsfind.c (find): Use dirent.d_type info from latest fts.
---
 find/ftsfind.c |   10 +++++-----
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/find/ftsfind.c b/find/ftsfind.c
index 543b80f..bd81c91 100644
--- a/find/ftsfind.c
+++ b/find/ftsfind.c
@@ -472,8 +472,8 @@ consider_visiting(FTS *p, FTSENT *ent)
       || ent->fts_info == FTS_NS /* e.g. symlink loop */)
     {
       assert (!state.have_stat);
-      assert (!state.have_type);
-      state.type = mode = 0;
+      assert (state.type != 0);
+      mode = state.type;
     }
   else
     {
@@ -614,9 +614,9 @@ find(char *arg)
     {
       while ( (ent=fts_read(p)) != NULL )
        {
-         state.have_stat = false;
-         state.have_type = false;
-         state.type = 0;
+         state.have_stat = false; // FIXME: depends on FTS_NOSTAT
+         state.have_type = !!ent->fts_statp->st_mode;
+         state.type = state.have_type ? ent->fts_statp->st_mode : 0;
          consider_visiting(p, ent);
        }
       fts_close(p);
--
1.6.0.4.1101.g642f8




reply via email to

[Prev in Thread] Current Thread [Next in Thread]