[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: bug: bash 4.2.20 impossibly slow
From: |
Somchai Smythe |
Subject: |
Re: bug: bash 4.2.20 impossibly slow |
Date: |
Mon, 19 Mar 2012 03:52:48 +0700 |
On 3/16/12, Chet Ramey <chet.ramey@case.edu> wrote:
> On 3/14/12 2:14 PM, Somchai Smythe wrote:
>> Hello,
>>
>> I am reporting a problem with performance, not correctness.
>>
>> While preparing some examples for a course lecture where I code the
>> same algorithm in many languages to compare languages, I ran some code
>> and while it was reasonably quick with ksh, it would just apparently
>> hang at 100% cpu in bash. I finally let it run overnight and it does
>> complete correctly in bash, but what takes ksh less than a minute
>> takes bash 6 1/2 hours to complete (and keeping one core at 100% the
>> entire 6.5 hours) on the same hardware. I suspect there may be some
>> special way to compile bash that I don't know about that maybe works
>> with arrays differently, so I reporting this. I am not subscribed, so
>> please cc: me. I cannot use bashbug since my university blocks
>> outgoing mail. I used exactly the same file unmodified for the tests
>> in ksh and bash. My hope is that bash would be at least 'competitive'
>> and complete it without being more than 10x slower. As it is, I
>> cannot use bash in the lecture (for this) since it is only 3 hours
>> long and the program won't complete in that amount of time.
>>
>> BLS2 $bash --version
>> GNU bash, version 4.2.20(2)-release (x86_64-unknown-linux-gnu)
>> Copyright (C) 2011 Free Software Foundation, Inc.
>> License GPLv3+: GNU GPL version 3 or later
>> <http://gnu.org/licenses/gpl.html>
>
> [...]
>
>> The program run was a simple prime number sieve program using an array
>> with only 500000 elements:
>>
>> #! /bin/bash
>>
>> if ((${#}==1))
>> then
>> n=${1}
>> else
>> n=500000
>> fi
>> for ((i=1;i<=n;i++))
>> do
>> ((a[i]=1))
>> done
>> for ((i=2;i<=(n/2);i++))
>> do
>> for ((j=2;j<=(n/i);j++))
>> do
>> ((k=i*j))
>> ((a[k]=0))
>> done
>> done
>> for ((i=1;i<=n;i++))
>> do
>> if ((a[i]!=0))
>> then
>> printf "%d\n" ${i}
>> fi
>> done
>> exit 0
>>
>> When run like:
>> time bash ./sieve.sh
>>
>> .....
>> 499717
>> 499729
>> 499739
>> 499747
>> 499781
>> 499787
>> 499801
>> 499819
>> 499853
>> 499879
>> 499883
>> 499897
>> 499903
>> 499927
>> 499943
>> 499957
>> 499969
>> 499973
>> 499979
>>
>> real 396m9.884s
>> user 395m43.102s
>> sys 0m8.913s
>>
> [...]
>
>> Computer is a Core2 Duo with 3GB of ram running a pure64bit linux
>> distribution with kernel 3.2.9 with gcc 4.5.3 and glibc 2.14.1.
>>
>> Both programs get the same answer, so this is not a correctness issue,
>> but instead a performance issue.
>>
> [...]
>
>> Experimenting a bit shows me that at 9999 elements bash is still
>> reasonably fast, but at 29999 elements it takes:
>>
>> real 0m39.077s
>> user 0m38.807s
>> sys 0m0.150s
>>
>> For that, ksh takes:
>>
>> real 0m1.631s
>> user 0m1.560s
>> sys 0m0.007s
>>
>> Perhaps that shorter total time still shows the problem dramatically
>> enough that runs that size can be used to track down the problem
>> without having to wait hours for the test runs.
>
> As I said in my earlier reply, the bash array implementation uses sparse
> doubly-linked lists. Inserts that don't append a value to the end of
> the array take O(N) instead of O(1), since you have to search the list
> to find the right spot to insert the new element.
>
> There is a lot that can be done to accommodate sequential insertion
> patterns, which fits the sieve algorithm pretty well. It would be
> pretty simple, since the array code already maintains a pointer to the
> last reference; starting the search for the spot to insert at the last
> reference should reduce the search time considerably.
>
> That makes a pretty big difference. Starting at 9999 elements, but
> removing the echos to minimize output:
>
> (bash)
> real 0m1.833s
> user 0m1.704s
> sys 0m0.118s
>
> (bash-4.2.24)
> real 0m2.728s
> user 0m2.603s
> sys 0m0.114s
>
> With 29999 elements:
> (bash)
> real 0m6.957s
> user 0m6.437s
> sys 0m0.377s
>
> (bash-4.2.24)
> real 0m16.520s
> user 0m16.113s
> sys 0m0.387s
>
> You start seeing real differences at 99999 elements (for what it's worth,
> loading up the array initially takes about 1.2s of that total):
>
> (bash)
> real 0m32.329s
> user 0m30.845s
> sys 0m1.376s
>
> (bash-4.2.24)
> real 2m46.972s
> user 2m45.095s
> sys 0m1.590s
>
> At this point I quit testing bash-4.2.24; I don't have *that* much time.
> The rest of these are just with the modified development sources, and not
> in a really rigorous way, since I was using the machine for other things
> at the time.
>
> 299999 elements:
> real 3m24.318s
> user 3m19.463s
> sys 0m4.768s
>
> 399999 elements:
> real 5m27.585s
> user 5m20.870s
> sys 0m6.511s
>
> 499999 elements:
> real 8m9.997s
> user 8m0.963s
> sys 0m8.553s
>
> It's never going to be O(1) unless I rewrite the whole array module, and
> I'm not saying that the bash builtin malloc's memory allocation patterns
> don't have an effect, but small changes (which I've attached) can make a
> big difference.
>
> Chet
Thank you very much for the patch; it makes a very dramatic difference
for this use case. In my testing, it went from 6.5 hours to just over
20 minutes.
I guess this wasn't the default since it may not be great for random
access, but I hope that it is at least available as a compile-time
option in future official bash releases. For now, I'll apply this
array access speedup patch to our packages.
John Gatewood Ham
Lecturer
Burapha University
Bang Saen, Chonburi
Thailand