[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Why is `find -name '*.txt'` much slower than '*.txt' on glusterfs?
From: |
Peng Yu |
Subject: |
Re: Why is `find -name '*.txt'` much slower than '*.txt' on glusterfs? |
Date: |
Wed, 24 Jan 2018 14:17:54 -0600 |
Here are the results of `strace -c` and the runtime with `-maxdepth 1`
for `find`.
$ time find -maxdepth 1 -name '*.tsv' > /dev/null
real 0m21.118s
user 0m0.446s
sys 0m0.577s
$ time find -name '*.tsv' > /dev/null
real 0m21.277s
user 0m0.454s
sys 0m0.636s
$ time ./main.sh > /dev/null
real 0m2.695s
user 0m0.046s
sys 0m0.057s
$ cat main.sh
#!/usr/bin/env bash
# vim: set noexpandtab tabstop=2:
echo *.tsv
$ strace -c find -maxdepth 1 -name '*.tsv' > /dev/null
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
93.87 0.433259 18 23439 newfstatat
5.70 0.026288 25 1067 getdents
0.16 0.000725 45 16 10 open
0.06 0.000259 22 12 mmap
0.04 0.000190 32 6 read
0.03 0.000143 3 54 brk
0.03 0.000132 22 6 mprotect
0.03 0.000125 18 7 fstat
0.02 0.000109 109 1 1 access
0.02 0.000075 7 11 close
0.01 0.000064 21 3 3 stat
0.01 0.000036 4 10 fcntl
0.01 0.000035 35 1 execve
0.01 0.000033 33 1 munmap
0.00 0.000019 6 3 2 ioctl
0.00 0.000017 1 21 write
0.00 0.000017 17 1 arch_prctl
0.00 0.000016 16 1 uname
0.00 0.000014 7 2 fstatfs
0.00 0.000000 0 1 fchdir
0.00 0.000000 0 1 sysinfo
0.00 0.000000 0 1 openat
------ ----------- ----------- --------- --------- ----------------
100.00 0.461556 24665 16 total
$ strace -c ./main.sh > /dev/null
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
77.26 0.009637 9 1067 getdents
7.50 0.000936 49 19 8 open
3.99 0.000498 26 19 17 execve
2.38 0.000297 3 94 brk
2.18 0.000272 13 21 mmap
1.48 0.000185 4 46 23 stat
1.15 0.000144 12 12 mprotect
1.05 0.000131 15 9 read
0.99 0.000123 15 8 4 access
0.62 0.000077 8 10 fstat
0.51 0.000064 6 11 close
0.24 0.000030 30 1 munmap
0.24 0.000030 2 14 rt_sigaction
0.14 0.000017 9 2 arch_prctl
0.07 0.000009 2 6 rt_sigprocmask
0.06 0.000008 8 1 sysinfo
0.02 0.000003 1 4 4 ioctl
0.02 0.000002 0 70 write
0.02 0.000002 2 1 uname
0.02 0.000002 0 5 getuid
0.02 0.000002 0 5 getgid
0.02 0.000002 0 5 geteuid
0.02 0.000002 0 5 getegid
0.00 0.000000 0 3 lseek
0.00 0.000000 0 1 dup2
0.00 0.000000 0 1 getpid
0.00 0.000000 0 3 1 fcntl
0.00 0.000000 0 2 getrlimit
0.00 0.000000 0 1 getppid
0.00 0.000000 0 1 getpgrp
------ ----------- ----------- --------- --------- ----------------
100.00 0.012473 1447 57 total
On Wed, Jan 24, 2018 at 1:39 AM, Bernhard Voelker
<address@hidden> wrote:
> On 01/24/2018 01:44 AM, Peng Yu wrote:
>>
>> The attached files are the strace results for `echo` and `find`. Can
>> anybody check if there is a way to improve the performance of `find`
>> so that it can work as efficient as `echo` in this test case? Thanks.
>>
>> $ cat main.sh
>> #!/usr/bin/env bash
>> # vim: set noexpandtab tabstop=2:
>>
>> echo *.txt
>> $ strace ./main.sh 2>/tmp/echo_strace.txt
>> $ strace find -name '*.txt' > /dev/null 2>/tmp/find_strace.txt
>
>
> First of all, please refrain from attaching such huge files when
> sending to mailing lists like this; either upload them to a web
> paste bin, or at least compress the files, e.g. the larger file
> could have wasted only <100k instead of 2.3M. Thanks.
>
> Regarding the strace outputs: you did neither of the tips of
> James (use "strace -c ...") nor of Dale (use "find -maxdepth 1 ..."),
> so just from the number of system calls one could already guess
> that the time is spent by the newfstatat() calls.
>
> We don't see what the previous getdents() calls return (strace -v),
> but it seems that it doesn't include D_TYPE information on glusterfs.
> Therefore, as you omitted the '-maxdepth 1' argument, find needs
> to dig deeper to check if any of the entries have been a directory
> (it would need to recurse to).
>
> BTW: you already got the same answer on your cross-posting [1].
> https://lists.gnu.org/r/coreutils/2018-01/msg00058.html
>
> Have a nice day,
> Berny
>
>
>
--
Regards,
Peng
- Why is `find -name '*.txt'` much slower than '*.txt' on glusterfs?, Peng Yu, 2018/01/20
- Re: Why is `find -name '*.txt'` much slower than '*.txt' on glusterfs?, Dale R. Worley, 2018/01/20
- Re: Why is `find -name '*.txt'` much slower than '*.txt' on glusterfs?, James Youngman, 2018/01/21
- Re: Why is `find -name '*.txt'` much slower than '*.txt' on glusterfs?, Dale R. Worley, 2018/01/21
- Re: Why is `find -name '*.txt'` much slower than '*.txt' on glusterfs?, Peng Yu, 2018/01/23
- Re: Why is `find -name '*.txt'` much slower than '*.txt' on glusterfs?, Morgan Weetman, 2018/01/23
- Re: Why is `find -name '*.txt'` much slower than '*.txt' on glusterfs?, Bernhard Voelker, 2018/01/24
- Re: Why is `find -name '*.txt'` much slower than '*.txt' on glusterfs?,
Peng Yu <=
- Re: Why is `find -name '*.txt'` much slower than '*.txt' on glusterfs?, Bernhard Voelker, 2018/01/27
- Re: Why is `find -name '*.txt'` much slower than '*.txt' on glusterfs?, Peng Yu, 2018/01/27
- Re: Why is `find -name '*.txt'` much slower than '*.txt' on glusterfs?, Bernhard Voelker, 2018/01/27
- Re: Why is `find -name '*.txt'` much slower than '*.txt' on glusterfs?, Peng Yu, 2018/01/27
- Re: Why is `find -name '*.txt'` much slower than '*.txt' on glusterfs?, Bernhard Voelker, 2018/01/28