bug-findutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Why is `find -name '*.txt'` much slower than '*.txt' on glusterfs?


From: Bernhard Voelker
Subject: Re: Why is `find -name '*.txt'` much slower than '*.txt' on glusterfs?
Date: Sun, 28 Jan 2018 19:57:28 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2

On 01/27/2018 06:45 PM, Peng Yu wrote:
glusterfs doesn't provide D_TYPE information:

getdents(4, {{d_ino=10054722685526780333, ..., d_type=DT_UNKNOWN} ...

Nevertheless, it is strange that find calls newfstatat() also
in the case of "-maxdepth 1" - it shouldn't need to.


Should this be considered as a performance bug of 'find'?

well, maybe.

I could reproduce this case with sshfs where getdents also returns DT_UNKNOWN.

  $ mkdir -p ~/tmp/d1 \
      && seq 10000 | xargs env -C ~/tmp/d1 touch

  $ mkdir -p ~/tmp/mnt \
      && sshfs localhost:tmp/d1 ~/tmp/mnt

  $ strace -ve getdents,newfstatat find ~/tmp/mnt -maxdepth 1

  $ strace -ve getdents,newfstatat find -D search ~/tmp/mnt -maxdepth 1 -name 
doesntmatter

The problem seems to be that gnulibs' fts_read() already tries to determine
whether the current item is a directory [1]:

  [...]
  getdents(4, [], 32768)                  = 0
  newfstatat(5, "8793", {st_dev=makedev(0, 46), st_ino=2, st_mode=S_IFREG|0644, 
...}, AT_SYMLINK_NOFOLLOW) = 0

before find() sees it [2]:

  consider_visiting (early): ‘/home/berny/tmp/mnt/8793’: fts_info=FTS_F , [...]

@James: do you have an idea how to work around this?

[1]
https://git.sv.gnu.org/cgit/gnulib.git/tree/lib/fts.c?id=d4f6a210f44a#n1054
[2]
https://git.sv.gnu.org/cgit/findutils.git/tree/find/ftsfind.c?id=040f20b91e#n559

Have a nice day,
Berny



reply via email to

[Prev in Thread] Current Thread [Next in Thread]