parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Slow start to cope with load


From: Matt Oates (Home)
Subject: Re: Slow start to cope with load
Date: Mon, 19 Mar 2012 11:27:46 +0000

On 19 March 2012 10:25, Ole Tange <tange@gnu.org> wrote:
> On Mon, Mar 19, 2012 at 10:20 AM, Matt Oates (Home) <mattoates@gmail.com> 
> wrote:
>> Am I wrong in thinking you can just do -j 100% so that you never spawn
>> more than maxload processes assuming one process load 1.0 on a single
>> core? Can you not use -j 100% in conjunction with --load to prevent
>> the overload on startup?
>
> For CPU hungry programs like 'burnP6' that would be true. But if the
> program only uses 10% CPU (because it is waiting for network or disk
> I/O), then we should be able to spawn more - preferably automatically
> figuring out the "right" amount.

If it is low because of blocking spawning more jobs isn't going to
help the wait on IO.

>> > While some programs run multiple threads (and thus can give a load > 1
>> > each) that is the exception. So in general I think we can assume one
>> > job will at most give a load of 1.
>>
>> It would be nice to explicitly state the likely load per process
>> though especially if you are the one setting it. I frequently run hmm
>> building with concurrent threading per process and just do the maths
>> myself, and am lucky that all the hosts have the same number of CPUs.
>> Perhaps a flag like --is-threaded=4  or something to indicate the
>> likely load per job?
>
> I am not too happy about that. I would much prefer some automated way
> of doing-the-right-thing.

If I'm already setting this manually though why do the right thing
automatically when I know what the right thing to do is. I agree
having parallel throttle automatically as normal is best. But it would
be nice to explicitly state what you know if you are already
specifying it in the job.

>> Looks good, though I have a couple of questions: If this is negative
>> are you going to kill processes rather than start them? What if it's
>> always 0 even from the start are you just never going to run on this
>> host?
>
> As a user I would be very surprised if GNU Parallel started to kill my
> jobs, and I try to design GNU Parallel adherring to POLA:
> http://en.wikipedia.org/wiki/Principle_of_least_astonishment
>
> So if it is < 1 it would mean: Do not spawn more new jobs, but wait
> for jobs to complete.

Great that's what I wanted to hear :) I already have problems with the
kernel process killer hitting my jobs when someone else submits a big
job, it would be really lame if my job killed itself too.

>> > I believe it would be better than the current, but I am very open to
>> > even better ideas.
>>
>> You are starting to get into the realm of needing to understand
>> scheduling per host... Load might be reported for something with a
>> different nice value than what you want to submit. So 100% load for
>> something with <0 nice and you want to put something in for +19. In
>> your equation above I would just add in something looking at the
>> difference between parallel's jobs that are running and those that are
>> ready/waiting. If all our jobs are running even under high load who
>> cares, we have priority here so keep up with the max load. If half of
>> our jobs are waiting then we might as well reduce spawning by half.
>
> I did not understand this part.

Two points:
1.) You can have high but very low priority load. In this case we want
a high priority job to ignore the load because it can replace it
completely. For example updatedb is usually low nice value, when we
come along with our job it doesn't matter if there is high load since
we will knock updatedb off of the scheduling queue.
2.) You can take into account priority by just including what
percentage of our jobs are in the "running" process state rather than
"ready" or "waiting" state. So if there is high load and we put in 100
processes and all of them are running, it's fine... if only 1 is
running and the rest are just waiting then we should alter
appropriately to that ratio until you find a natural size on the host
machine.

Hope thats a bit more clear? It just means adjusting your equation to
something like:

number_of_concurrent_jobs = max_load - current_load +
(number_of_concurrent_jobs - number_of_concurrent_jobs_in_wait_state /
2)

That way you quickly converge on the number of processes that can run,
I'd ignore those that are blocked on IO, just negate the ones that are
literally waiting on CPU.

Best,
Matt.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]