[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#64735: 29.0.92; find invocations are ~15x slower because of ignores
From: |
sbaugh |
Subject: |
bug#64735: 29.0.92; find invocations are ~15x slower because of ignores |
Date: |
Thu, 20 Jul 2023 12:22:19 +0000 (UTC) |
User-agent: |
Gnus/5.13 (Gnus v5.13) |
Eli Zaretskii <eliz@gnu.org> writes:
>> From: Spencer Baugh <sbaugh@janestreet.com>
>> Date: Wed, 19 Jul 2023 17:16:31 -0400
>>
>>
>> Several important commands and functions invoke find; for example rgrep
>> and project-find-regexp.
>>
>> Most of these add some set of ignores to the find command, pulling from
>> grep-find-ignored-files in the former case. So the find command looks
>> like:
>>
>> find -H . \( -path \*/SCCS/\* -o -path \*/RCS/\* [...more ignores...] \)
>> -prune -o -type f -print0
>>
>> Alas, on my system, using GNU find, these ignores slow down find by
>> about 15x on a large directory tree, taking it from around .5 seconds to
>> 7.8 seconds.
>>
>> This is very noticeable overhead; removing the ignores makes rgrep and
>> other find-invoking commands substantially faster for me.
>
> grep-find-ignored-files is a customizable user option, so if this
> slowdown bothers you, just customize it to avoid that.
I think the fact that the default behavior is very slow, is bad.
> And if there are patterns there that are no longer pertinent or rare,
> we could remove them from the default value.
Sure!
So the thing to narrow down would be completion-ignored-extensions,
which is what populates grep-find-ignored-files. Most things in that
list are irrelevant to most users, but all of them are relevant to some
users.
Most of these are language-specific things - e.g. there's a bunch of
Common Lisp compiled object (or something) extensions.
Perhaps we could modularize this, so that individual packages add things
to completion-ignored-extensions at load time. Then
completion-ignored-extensions would only include things which are
relevant to a given user, as determined by what packages they load.
> I'm not sure we should bother more than these two simple measures.
Unfortunately those two simple measures help rgrep but they don't help
project-find-regexp (and others project.el commands using
project--files-in-directory such as project-find-file), since those
project commands pull their ignores from the version control system
through vc (not grep-find-ignored-files), and then pass them to find.
>> The overhead is linear in the number of ignores - that is, each
>> additional ignore adds a small fixed cost. This suggests that find is
>> linearly scanning the list of ignores and checking each one, rather than
>> optimizing them to a single regexp and checking that regexp.
>
> If it uses fnmatch, it cannot do it any other way, I think
bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Dmitry Gutov, 2023/07/20
- bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Ihor Radchenko, 2023/07/20
- bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Dmitry Gutov, 2023/07/20
- bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Ihor Radchenko, 2023/07/20
- bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Dmitry Gutov, 2023/07/20
- bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Ihor Radchenko, 2023/07/20
- bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Dmitry Gutov, 2023/07/20