bug-findutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: find and glob


From: James Youngman
Subject: Re: find and glob
Date: Thu, 29 Mar 2012 12:27:56 +0200

On Tue, Mar 27, 2012 at 3:24 PM, Mark Hills <address@hidden> wrote:
> We traverse portions of our filesystem and apply a find action to them;
> currently by allowing the shell to expand the glob; eg.
>
>  find ./*/xx/*/yy
>
> But the expansion can be large and problematic before being passed to
> find.

I'm not sure what you mean by "problematic" here.   It' spossible I
suppose that the shell runs out of RAM in which to expand the glob, or
the results exceed ARG_MAX.    I'll assume it's the latter for the
purpose of this reply, please correct me if this is the wrong
interpretation of what you meant.


> To do the equivalent in find itself is slow.

How slow?   How much slower?

> The whole hierarchy is traversed (which is slow), and only matching results 
> displayed:
>
>  find . -path './*/xx/*/yy'

You don't state what the structure of your filesystem hierarchy is, so
it is hard to give entirely reliable advice here.   I'm going to guess
a bit about things like the depth of the tree (which I'm going to
guess is large), the total number of files below ".' (also large) and
the cardinalities of the expansions of "*" in the glob above (also
large).

If that is your whole command line you are certainly using find in an
inefficient way.   It's hard to say for sure since you don't state
what fraction of the whole filesystem hierarchy you need to visit, or
what the actions are.    However, the predicates -mindepth, -maxdepth,
-prune and -quit can be used to limit or terminate the filesystem
search.

> Is there a way to have find itself only visit the relevant portions of the
> filesystem?

Certainly.  If I knew quite what you meant by "relevant" I could
provide a more useful response.   Instead I will provide some
examples.

We start with your original command, which you state as problematic:

$  find ./*/xx/*/yy

I'm going to assume you really meant you use

$  find ./*/xx/*/yy -actions

where -actions is some non-empty mixture of find predicates and
actions.  If -actions already includes -mindepth, -maxdepth, -prune or
(most awkwardly) -quit, some of the examples below are going to need
adjustment.

The simplest rearrangement is

$ for start in find ./*/xx; do
  find "${start}"/*/yy -actions
done

This will dramatically cut down the number of arguments passed to each
invocation of find, an so may be enough by itself to form a
satisfactory solution to your problem.   If the argument count is
still too  large you could also try:

$ for start in ./*/xx; do
  for sub in "${start}"/*/yy; do
    find "${sub}" -actions
  done
done

If you still have a problem with this second option, it's likely that
one of the "*"s expands to a sufficiently large list that ARG_MAX is
still exceeded.   You can overcome this by transforming the loop into
find predicates.   I'll do this with only the inner loop for
simplicity:

  for sub in "${start}"/*/yy; do
    find "${sub}" -actions
  done

becomes

find "${start}" -mindepth 2 \( -depth 2 \! -name yy -prune , -true \) -actions

If -actions contains tests like -depth, options like -mindepth or
-maxdepth, then some adjustment will be needed there.

> The manual [1] seems to suggest using locate and xargs. Keeping an index
> is not practical for us,

I assume because either the tree changes frequently and multiple
independent locate indexes would be no help (since all parts of the
tree change frequently).

> so I wrote a simple command around the glob(3)
> function to do the traversal and print to stdout. Am I missing some well
> established method here?

It's difficult to give a definitive answer here since you don't state
what you're actually trying to achieve.   I hope the above was useful
anyway.


>
> Please keep me CC'd, as I read from the archives. Many thanks.
>
> [1] 
> http://www.gnu.org/software/findutils/manual/html_mono/find.html#Fast-Full-Name-Search
>
> --
> Mark
>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]