[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: parallel sort at fault? [Re: [PATCH] tests: avoid gross inefficiency
From: |
Jim Meyering |
Subject: |
Re: parallel sort at fault? [Re: [PATCH] tests: avoid gross inefficiency... |
Date: |
Wed, 16 Mar 2011 16:32:32 +0100 |
Pádraig Brady wrote:
> # SUBTHREAD_LINES_HEURISTIC = 4
> $ for i in $(seq 22); do
> j=$((2<<$i))
> yes | head -n$j > t.sort
> strace -f -c -e clone ./sort --parallel=16 t.sort -o /dev/null 2>&1 |
> join --nocheck-order -a1 -o1.4,1.5 - /dev/null |
> sed -n "s/\([0-9]*\) clone/$j\t\1/p"
> done
> 4 1
> 8 3
> 16 7
> 32 15
> 64 15
> 128 15
> 256 15
> 512 15
> 1024 15
> 2048 15
> 4096 15
> 8192 15
> 16384 15
> 32768 15
> 65536 15
> 131072 15
> 262144 15
> 524288 15
> 1048576 15
> 2097152 15
> 4194304 30
> 8388608 45
>
> # As above, but add -S1M option to sort
>
> 4 1
> 8 3
> 16 7
> 32 15
> 64 15
> 128 15
> 256 15
> 512 15
> 1024 15
> 2048 15
> 4096 15
> 8192 15
> 16384 30
> 32768 45
> 65536 90
> 131072 165
> 262144 315
> 524288 622
> 1048576 1245
> 2097152 2475
> 4194304 4935
> 8388608 9855
>
> With SUBTHREAD_LINES_HEURISTIC=128k and -S1M option to sort we get no threads
> as
> nlines never gets above 12787 (there looks to be around 80 bytes overhead per
> line?).
> Only when -S >= 12M do we get nlines high enough to create threads.
Thanks for pursuing this.
Here's a proposed patch to address the other problem.
It doesn't have much of an effect (any?) on your
issue when using very little memory, but when a sort user
specifies -S1M, I think they probably want to avoid the
expense (memory) of going multi-threaded.
What do you think?
>From 4f591fdd0bb78f621d2b72021de883fc4df1e179 Mon Sep 17 00:00:00 2001
From: Jim Meyering <address@hidden>
Date: Wed, 16 Mar 2011 16:09:31 +0100
Subject: [PATCH] sort: avoid memory pressure of 130MB/thread when reading
from pipe
* src/sort.c (INPUT_FILE_SIZE_GUESS): Decrease initial allocation
factor used to size buffer used when reading a non-regular file.
For motivation, see discussion here:
http://thread.gmane.org/gmane.comp.gnu.coreutils.general/878/focus=887
---
src/sort.c | 8 ++++++--
1 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/src/sort.c b/src/sort.c
index 9b8666a..07d6765 100644
--- a/src/sort.c
+++ b/src/sort.c
@@ -319,8 +319,12 @@ static size_t merge_buffer_size = MAX
(MIN_MERGE_BUFFER_SIZE, 256 * 1024);
specified by the user. Zero if the user has not specified a size. */
static size_t sort_size;
-/* The guessed size for non-regular files. */
-#define INPUT_FILE_SIZE_GUESS (1024 * 1024)
+/* The initial allocation factor for non-regular files.
+ This is used, e.g., when reading from a pipe.
+ Don't make it too big, since it is multiplied by ~130 to
+ obtain the size of the actual buffer sort will allocate.
+ Also, there may be 8 threads all doing this at the same time. */
+#define INPUT_FILE_SIZE_GUESS (128 * 1024)
/* Array of directory names in which any temporary files are to be created. */
static char const **temp_dirs;
--
1.7.4.1.430.g5aa4d