[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: parallel sort at fault? [Re: [PATCH] tests: avoid gross inefficiency
From: |
Pádraig Brady |
Subject: |
Re: parallel sort at fault? [Re: [PATCH] tests: avoid gross inefficiency... |
Date: |
Wed, 16 Mar 2011 16:15:41 +0000 |
User-agent: |
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.8) Gecko/20100227 Thunderbird/3.0.3 |
On 16/03/11 15:32, Jim Meyering wrote:
> Pádraig Brady wrote:
>>
>> With SUBTHREAD_LINES_HEURISTIC=128k and -S1M option to sort we get no
>> threads as
>> nlines never gets above 12787 (there looks to be around 80 bytes overhead
>> per line?).
>> Only when -S >= 12M do we get nlines high enough to create threads.
>
> Thanks for pursuing this.
> Here's a proposed patch to address the other problem.
> It doesn't have much of an effect (any?) on your
> issue when using very little memory, but when a sort user
> specifies -S1M, I think they probably want to avoid the
> expense (memory) of going multi-threaded.
>
> What do you think?
>
> -#define INPUT_FILE_SIZE_GUESS (1024 * 1024)
> +#define INPUT_FILE_SIZE_GUESS (128 * 1024)
This does seem a bit like whack-a-mole
but at least we're lining them up better.
The above gives reasonable threading by default,
while reducing the large upfront malloc.
$ for len in 1 79; do
for i in $(seq 22); do
lines=$((2<<$i))
yes "$(printf %${len}s)"| head -n$lines > t.sort
strace -f -c -e clone ./sort --parallel=16 t.sort -o /dev/null 2>&1 |
join --nocheck-order -a1 -o1.4,1.5 - /dev/null |
sed -n "s/\([0-9]*\) clone/$lines\t\1/p"
done
done
#lines threads (2 byte lines)
------------------------------
131072 1
262144 3
524288 7
1048576 15
2097152 15
4194304 15
8388608 15
#lines threads (80 byte lines)
------------------------------
131072 1
262144 3
524288 7
1048576 15
2097152 15
4194304 22
8388608 60
cheers,
Pádraig.