[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: parallel sort at fault? [Re: [PATCH] tests: avoid gross inefficiency
From: |
Pádraig Brady |
Subject: |
Re: parallel sort at fault? [Re: [PATCH] tests: avoid gross inefficiency... |
Date: |
Wed, 16 Mar 2011 13:33:08 +0000 |
User-agent: |
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.8) Gecko/20100227 Thunderbird/3.0.3 |
On 16/03/11 12:07, Jim Meyering wrote:
> Pádraig Brady wrote:
>> I've not fully analyzed this yet, and I'm not saying it's wrong,
>> but the above change seems to have a large effect on thread
>> creation when smaller buffers are used (you hinted previously
>> that being less aggressive with the amount of mem used by default
>> might be appropriate, and I agree).
>>
>> Anyway with the above I seem to need a buffer size more
>> than 10M to have any threads created at all.
>>
>> Testing the original 4 lines heuristic with the following, shows:
>> (note I only get > 4 threads after 4M of input, not 7 for 16 lines
>> as indicated in NEWS).
>>
>> $ for i in $(seq 30); do
>>> j=$((2<<$i))
>>> yes | head -n$j > t.sort
>>> strace -c -e clone sort --parallel=16 t.sort -o /dev/null 2>&1 |
>>> join --nocheck-order -a1 -o1.4,1.5 - /dev/null |
>>> sed -n "s/\([0-9]*\) clone/$j\t\1/p"
>>> done
>> 4 1
>> 8 2
>> 16 3
>> 32 4
>> 64 4
>> 128 4
> ...
>> 1048576 4
>> 2097152 4
>> 4194304 8
>> 8388608 16
>>
>> When I restrict the buffer size with '-S 1M', many more threads
>> are created (a max of 16 in parallel with the above command)
>> 4 1
>> 8 2
>> 16 3
>> 32 4
>> 64 4
>> 128 4
>> 256 4
>> 512 4
>> 1024 4
>> 2048 4
>> 4096 4
>> 8192 4
>> 16384 8
>> 32768 12
>> 65536 24
>> 131072 44
>> 262144 84
>> 524288 167
>> 1048576 332
>> 2097152 660
>> 4194304 1316
>> 8388608 2628
>>
>> After increasing the heuristic to 128K, I get _no_ threads until -S > 10M
>> and this seems to be independent of line length.
>
> Thanks for investigating that.
> Could strace -c -e clone be doing something unexpected?
> When I run this (without my patch), it would use 8 threads:
>
> seq 16 > in; strace -ff -o k ./sort --parallel=16 in -o /dev/null
>
> since it created eight k.PID files:
>
> $ ls -1 k.*|wc -l
> 8
>
> Now, for such a small file, it does not call clone at all.
>
Oops, yep I forget to add -f to strace.
So NEWS is correct.
# SUBTHREAD_LINES_HEURISTIC = 4
$ for i in $(seq 22); do
j=$((2<<$i))
yes | head -n$j > t.sort
strace -f -c -e clone ./sort --parallel=16 t.sort -o /dev/null 2>&1 |
join --nocheck-order -a1 -o1.4,1.5 - /dev/null |
sed -n "s/\([0-9]*\) clone/$j\t\1/p"
done
4 1
8 3
16 7
32 15
64 15
128 15
256 15
512 15
1024 15
2048 15
4096 15
8192 15
16384 15
32768 15
65536 15
131072 15
262144 15
524288 15
1048576 15
2097152 15
4194304 30
8388608 45
# As above, but add -S1M option to sort
4 1
8 3
16 7
32 15
64 15
128 15
256 15
512 15
1024 15
2048 15
4096 15
8192 15
16384 30
32768 45
65536 90
131072 165
262144 315
524288 622
1048576 1245
2097152 2475
4194304 4935
8388608 9855
With SUBTHREAD_LINES_HEURISTIC=128k and -S1M option to sort we get no threads as
nlines never gets above 12787 (there looks to be around 80 bytes overhead per
line?).
Only when -S >= 12M do we get nlines high enough to create threads.
cheers,
Pádraig.