parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: controlling memory use beyond --noswap


From: B. Franz Lang
Subject: Re: controlling memory use beyond --noswap
Date: Tue, 06 May 2014 10:33:07 -0400
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0

Hi Ole

I was not aware of 'niceload' and 'ulimit/joblog/resume-failed'.
These ARE basically the required solutions, and I'll definitely toy with them.

Here is my challenge: wouldn't there be a way to program this intelligently
into a 'parallel' function, so that the user might ask for processing of a set 
of
jobs that are memory-demanding; and that without significant system stutter,
one would see jobs finish gracefully?

Cheers Franz

ps.: nice to be in contact with folks that live a bit north of where I grew up 
(Kiel)  :-)
and that work on bioinformatics questions too


On 14-05-03 04:58 PM, Ole Tange wrote:
On Wed, Apr 30, 2014 at 11:35 PM, B. Franz Lang <Franz.Lang@umontreal.ca> wrote:

I have been trying to find a way that allows the use of 'parallel'
without completely freezing machines --- which in my case
is due to the parallel execution
of very memory-hungry applications (like a server that has 64 GB
memory, and one instance of an application - unforeseeable -
between 10-60 GB).
I have spend quite some time trying to think of a good way to deal
with that. But what is the correct thing to do?

Let us assume that we have 64 GB RAM and that most jobs take 10 GB but
20% of the jobs take between 10-60 GB and that we cannot predict which
jobs and we cannot predict how long they run.

In theory 80% of the time we can run 6 jobs (namely the 10 GB jobs).

How can we avoid that we start 2 jobs that will hit 60 GB at the same
time (or 3 25 GB jobs)?

If we can predict the memory usage, then the user can probably do that
even better.

niceload (part of the package) has --start-mem which will only start a
new job if there is a certain amount of memory free. Which may help in
some situations.

But it does not solve the situation where the next 3 jobs are 25 GB
jobs and they start out looking as 10 GB jobs, thus you only discover
that they are 25 GB jobs long after they started.

So right now the problem is to find an algorithm that would do the
right thing in most cases.

If your program reaches its max memory usage fast then I would suggest
you use 'ulimit' to kill off the jobs: That way you can run 6 10 GB
jobs at the time (killing jobs bigger than 10 GB). Using --joblog you
can keep track of the jobs that got killed. When all the 10 GB jobs
are complete, you can raise the ulimit and run 3 20 GB jobs with
--resume-failed, then 2 30 GB jobs and finally the rest one job at a
time.


/Ole




reply via email to

[Prev in Thread] Current Thread [Next in Thread]