Message: 4
Date: Wed, 14 Mar 2012 13:40:36 -0600
From: Bob Proulx<bob@proulx.com>
To: bug-bash@gnu.org
Subject: Re: compgen is slow for large numbers of options
Message-ID:<20120314194036.GA12661@hysteria.proulx.com>
Content-Type: text/plain; charset=us-ascii
Richard Neill wrote:
I don't know for certain if this is a bug per se, but I think
"compgen -W" is much slower than it "should" be in the case of a
large (10000+) number of options.
I don't think this is a bug but just simply a misunderstanding of how
much memory must be allocated in order to generate `seq 1 500000`.
For example (on a fast i7 2700 CPU), I measure:
compgen -W "`seq 1 50000`" 1794 #3.83 s
compgen -W "`seq 1 50000 | grep 1794`" #0.019 s
In these examples, I'm using `seq` as a trivial way to generate some
data, and picking 1794 as a totally arbitrary match.
Right. But in the first case you are generating 288,894 bytes of data
and in the second case 89 bytes of data. That is a large difference.
You could probably speed it up a small amount more by using grep -x -F
and avoid the common substring matches. And perhaps more depending
upon your environment with LC_ALL=C to avoid charset issues.
In the first example, compgen is doing the filtering, whereas in the
2nd, I obtain the same result very much faster with grep.
Yes, since grep is reducing the argument size to 89 bytes. That makes
perfect sense to me. Anything that processes 89 bytes is going to be
much faster than anything that processes 288,894 bytes.
If I increase the upper number by a factor of 10, to 500000, these
times become, 436 s (yes, really, 7 minutes!) and 0.20 s
That is an increase in argument size to 3,388,895 bytes and there will
be associated memory overhead to all of that increasing the size on
top of that value too. Three plus megabytes of memory just for the
creation of the argument list and then it must be processed. That
isn't going to be fast. You would do much better if you filtered that
list down to something reasonable.
respectively. This suggests that the algorithm used by compgen is
O(n^2) whereas the algorithm used by grep is 0(1).
On the counter it suggests that the algorithm you are using, one of
fully allocating all memory, is inefficient. Whereas using grep as a
filter to reduce that memory to 89 bytes is of course more efficient.
I wrote a response on a similar issue previously. Instead of posting
it again let me post a pointer to the previous message.
http://lists.gnu.org/archive/html/bug-bash/2011-11/msg00189.html
Bob