[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Determination of file lists from selected folders without returning

From: Eric Blake
Subject: Re: Determination of file lists from selected folders without returning directory names
Date: Wed, 19 Jul 2017 05:52:22 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1

On 07/19/2017 03:43 AM, SF Markus Elfring wrote:
>>>> why should a simple `basename` or ${name##*/} hurt the performance?
>>> Such data deletion causes also corresponding processing costs, doesn't it?

Have you heard of premature optimization?

If you have a program that spends 1% of its time generating then
stripping text, and 99% of its time doing much lengthier tasks on the
resulting text (and pretty much any syscall to open() or otherwise
manipulate files in the file system generally is MUCH slower than simple
text manipulation), then optimizing the text manipulation portion CANNOT
give you more than a 1% speedup in your overall program.

Instead of wasting our time arguing that providing the full relative
name to a file (which is a sane default for most uses, as acting on the
file itself from a different directory needs that full name), where
stripping the extra information for the rare cases (what cases? you
haven't even given a concrete demonstration with numbers of how often
you are actually encountering situations where having just the basename
to begin with lets you still do the right thing), the RIGHT thing to do
is benchmark it yourself.

At this point, patches speak louder than words.  If you can write a
benchmark where you write a simple .c program that can access just
basenames of files from an arbitrary directory (or an enhancement to the
find program), and show that your program outperforms baseline find for
your given use case, and where the outperformance actually makes a
difference to the overall usage pattern (that is, your benchmark also
shows that it was much more than 1% of the overall time spent on
producing then stripping the prefix data), then it is worth patching
find to provide that mode of operation (and such proof belongs best on
the findutils list).

> I propose to take another look at the applied data processing style.

I propose that you quit trying to micro-optimize something without a
benchmark case that we can reproduce showing that the amount of time
spent producing then stripping the data even makes a difference.

> How much will data processing for the parameter “-printf” influence
> rum time characteristics in undesired ways when the output function
> could be a fixed one like “basename()”?

Benchmark it and see for yourself.  And the answer is probably not
enough to be worth changing things.

Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization: |

Attachment: signature.asc
Description: OpenPGP digital signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]