Re: parallel sort at fault? [Re: [PATCH] tests: avoid gross inefficiency

coreutils

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: parallel sort at fault? [Re: [PATCH] tests: avoid gross inefficiency

From:	Jim Meyering
Subject:	Re: parallel sort at fault? [Re: [PATCH] tests: avoid gross inefficiency...
Date:	Wed, 09 Feb 2011 23:04:35 +0100

Jim Meyering wrote:
> Jim Meyering wrote:
>> Running "make -j25 check" on a nominal-12-core F14 system would
>> cause serious difficulty leading to an OOM kill -- and this is brand new.
>> It worked fine yesterday.  I tracked it down to all of the make processes
>> working on the "built_programs.list" (in src/Makefile.am) rule
>>
>> built_programs.list:
>>      @echo $(bin_PROGRAMS) $(bin_SCRIPTS) | tr ' ' '\n' \
>>        | sed -e 's,$(EXEEXT)$$,,' | $(ASSORT) -u | tr '\n' ' '
>>
>> Which made me realize we were running that submake over 400 times,
>> once per test scripts (including skipped ones).  That's well worth
>> avoiding, even if it means a new temporary file.
>>
>> I don't know the root cause of the OOM-kill (preceded by interminable
>> minutes of a seemingly hung and barely responsive system) or why it started
>> happening today (afaics, none of the programs involved was updated),
>> but this does fix it...
>
> FYI,
> I've tracked this down a little further.
> The horrid performance (hung system and eventual OOM-kill)
> are related to the use of sort above.  This is the definition:
>
>     ASSORT = LC_ALL=C sort
>
> If I revert my earlier patch and instead simply
> insist that sort not do anything in parallel,
>
>     ASSORT = LC_ALL=C sort --parallel=1
>
> then there is no hang, and things finish in relatively good time.
>
> I don't have a good stand-alone reproducer yet
> and am out of time for today.

Well, right after writing that, I realized the key to the misbehavior:
sort was reading from a *pipe*:

# This finishes right away, reading from input file "k":
seq 99 >k && for i in $(seq 33); do LC_ALL=C timeout 1 sort k > /dev/null & done

# When reading from a pipe, it's a very different story:
# Without the "timeout 7" prefix, the following would render an N-core
# system (5<N) unusable for many minutes.  As it is, be prepared:
# my system goes unresponsive after 1 second, and doesn't return until timeout.
for i in $(seq 33); do seq 88|LC_ALL=C timeout 7 sort --para=5 >/dev/null & done

Occasionally, the above jobs all complete quickly.

My first question was why were *any* processes being spawned to handle
such a small input file.  The first answer is in the first hunk:

diff --git a/src/sort.c b/src/sort.c
index 13954cb..b9316e7 100644
--- a/src/sort.c
+++ b/src/sort.c
@@ -112,9 +112,8 @@ struct rlimit { size_t rlim_cur; };
 /* Heuristic value for the number of lines for which it is worth
    creating a subthread, during an internal merge sort, on a machine
    that has processors galore.  Currently this number is just a guess.
-   This value must be at least 4.  We don't know of any machine where
-   this number has any practical effect.  */
-enum { SUBTHREAD_LINES_HEURISTIC = 4 };
+   This value must be at least 4.  */
+enum { SUBTHREAD_LINES_HEURISTIC = 32 * 1024 };

 /* The number of threads after which there are
    diminishing performance gains.  */

The old definition of SUBTHREAD_LINES_HEURISTIC meant that any group
of 5 or more lines would be split in two and sorted via two (or more)
separate threads.  Thus, with just 40 lines, you could get the maximum
of 8 threads working.  That is obviously not efficient, unless lines are
so pathologically long that the cost of comparing two of them approaches
the cost of creating a new process.

With the above, sort would use a more reasonable number.
Tests on high-end hardware and using very short lines
suggest that a value like 200,000 would still be conservative.

But even with that change, the above for-loop would sometimes hang.
Pursuing it further, I discovered that sort was allocating 130MB
(INPUT_FILE_SIZE_GUESS * 129, with the former being 1MiB) of space
for its input buffer, simply because it was reading from a pipe.

diff --git a/src/sort.c b/src/sort.c
index 13954cb..92c8d4e 100644
--- a/src/sort.c
+++ b/src/sort.c
@@ -316,7 +315,7 @@ static size_t merge_buffer_size = MAX 
(MIN_MERGE_BUFFER_SIZE, 256 * 1024);
 static size_t sort_size;

 /* The guessed size for non-regular files.  */
-#define INPUT_FILE_SIZE_GUESS (1024 * 1024)
+#define INPUT_FILE_SIZE_GUESS (64 * 1024)

 /* Array of directory names in which any temporary files are to be created. */
 static char const **temp_dirs;

With that, the while-loop does not hang any more.
Note again that this is on a multi-core system.

I'll investigate more tomorrow.

[Prev in Thread]

Current Thread

[Next in Thread]

[PATCH] tests: avoid gross inefficiency in "make test", Jim Meyering, 2011/02/09
- parallel sort at fault? [Re: [PATCH] tests: avoid gross inefficiency..., Jim Meyering, 2011/02/09
  - Re: parallel sort at fault? [Re: [PATCH] tests: avoid gross inefficiency..., Assaf Gordon, 2011/02/09
  - Re: parallel sort at fault? [Re: [PATCH] tests: avoid gross inefficiency..., Jim Meyering <=
- Re: [PATCH] tests: avoid gross inefficiency in "make test", Jim Meyering, 2011/02/10

Prev by Date: Re: parallel sort at fault? [Re: [PATCH] tests: avoid gross inefficiency...
Next by Date: Re: [PATCH] copy: adjust fiemap handling of sparse files
Previous by thread: Re: parallel sort at fault? [Re: [PATCH] tests: avoid gross inefficiency...
Next by thread: Re: [PATCH] tests: avoid gross inefficiency in "make test"
Index(es):
- Date
- Thread