bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: compgen is slow for large numbers of options


From: Richard Neill
Subject: Re: compgen is slow for large numbers of options
Date: Thu, 15 Mar 2012 19:38:49 +0000
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.27) Gecko/20120216 Lightning/1.0b2 Thunderbird/3.1.19

Dear Bob,

Thanks for your explanation. I do understand what is going on and why. But my point was that compgen has an implicit internal "grep" that is much less efficient than actual grep. Why is the performance of compgen's sorting/filtering algorithm so much worse than grep's ?
Both of them start with a large list, and filter it to a small one.

At any rate, might I suggest this is worthy of a line or two of documentation in the compgen manual? 40,000 possible completions is not necessarily unreasonable. [In this case, urpmi, as used by Mandriva/Mageia has had it wrong for some time.]

Best wishes,

Richard


Message: 4
Date: Wed, 14 Mar 2012 13:40:36 -0600
From: Bob Proulx<bob@proulx.com>
To: bug-bash@gnu.org
Subject: Re: compgen is slow for large numbers of options
Message-ID:<20120314194036.GA12661@hysteria.proulx.com>
Content-Type: text/plain; charset=us-ascii

Richard Neill wrote:
I don't know for certain if this is a bug per se, but I think
"compgen -W" is much slower than it "should" be in the case of a
large (10000+) number of options.

I don't think this is a bug but just simply a misunderstanding of how
much memory must be allocated in order to generate `seq 1 500000`.

For example (on a fast i7 2700 CPU), I measure:

compgen -W "`seq 1 50000`" 1794        #3.83 s
compgen -W "`seq 1 50000 | grep 1794`"   #0.019 s

In these examples, I'm using `seq` as a trivial way to generate some
data, and picking 1794 as a totally arbitrary match.

Right.  But in the first case you are generating 288,894 bytes of data
and in the second case 89 bytes of data.  That is a large difference.

You could probably speed it up a small amount more by using grep -x -F
and avoid the common substring matches.  And perhaps more depending
upon your environment with LC_ALL=C to avoid charset issues.

In the first example, compgen is doing the filtering, whereas in the
2nd, I obtain the same result very much faster with grep.

Yes, since grep is reducing the argument size to 89 bytes.  That makes
perfect sense to me.  Anything that processes 89 bytes is going to be
much faster than anything that processes 288,894 bytes.

If I increase the upper number by a factor of 10, to 500000, these
times become,  436 s (yes, really, 7 minutes!) and 0.20 s

That is an increase in argument size to 3,388,895 bytes and there will
be associated memory overhead to all of that increasing the size on
top of that value too.  Three plus megabytes of memory just for the
creation of the argument list and then it must be processed.  That
isn't going to be fast.  You would do much better if you filtered that
list down to something reasonable.

respectively.  This suggests that the algorithm used by compgen is
O(n^2) whereas the algorithm used by grep is 0(1).

On the counter it suggests that the algorithm you are using, one of
fully allocating all memory, is inefficient.  Whereas using grep as a
filter to reduce that memory to 89 bytes is of course more efficient.

I wrote a response on a similar issue previously.  Instead of posting
it again let me post a pointer to the previous message.

   http://lists.gnu.org/archive/html/bug-bash/2011-11/msg00189.html

Bob




reply via email to

[Prev in Thread] Current Thread [Next in Thread]