[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
parallel sort at fault? [Re: [PATCH] tests: avoid gross inefficiency...
From: |
Jim Meyering |
Subject: |
parallel sort at fault? [Re: [PATCH] tests: avoid gross inefficiency... |
Date: |
Wed, 09 Feb 2011 18:29:20 +0100 |
Jim Meyering wrote:
> Running "make -j25 check" on a nominal-12-core F14 system would
> cause serious difficulty leading to an OOM kill -- and this is brand new.
> It worked fine yesterday. I tracked it down to all of the make processes
> working on the "built_programs.list" (in src/Makefile.am) rule
>
> built_programs.list:
> @echo $(bin_PROGRAMS) $(bin_SCRIPTS) | tr ' ' '\n' \
> | sed -e 's,$(EXEEXT)$$,,' | $(ASSORT) -u | tr '\n' ' '
>
> Which made me realize we were running that submake over 400 times,
> once per test scripts (including skipped ones). That's well worth
> avoiding, even if it means a new temporary file.
>
> I don't know the root cause of the OOM-kill (preceded by interminable
> minutes of a seemingly hung and barely responsive system) or why it started
> happening today (afaics, none of the programs involved was updated),
> but this does fix it...
FYI,
I've tracked this down a little further.
The horrid performance (hung system and eventual OOM-kill)
are related to the use of sort above. This is the definition:
ASSORT = LC_ALL=C sort
If I revert my earlier patch and instead simply
insist that sort not do anything in parallel,
ASSORT = LC_ALL=C sort --parallel=1
then there is no hang, and things finish in relatively good time.
I don't have a good stand-alone reproducer yet
and am out of time for today.