[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Slow start to cope with load

From: Jay Hacker
Subject: Re: Slow start to cope with load
Date: Fri, 23 Mar 2012 17:20:03 -0400

I like that idea, since ps gives an instantaneous count instead of an
average.  I don't know much about process states, but some brief
playing around reveals some issues:

1. It's not accurate.  Running `pbzip2 bigfile.txt` while watching `ps
-u $USER -o s,pcpu,args` often shows state S (sleeping) even when
using 1600% CPU.
2. It doesn't account for multithreaded programs.  Running pbzip2 at
1600% CPU shows only one R (running).  Fortunately, ps -L seems to fix
this, while also helping #1.
3. If I have more than one disk or network adapter, I can usefully
have more than one process in the 'D' (I/O) state.  This seems tough
to get right automatically; perhaps a separate "--ioload" option is

If the machine is swapping, the user can fix that with --noswap.  But
swapping is really running out of memory, which is probably best
addressed with an orthogonal "--memload" or "--min-free-mem" option.

Having something like parallel --load 100% --ioload 4 --min-free-mem
2G would be awesome: only start new jobs if there are < $num_cpus
threads in the R state, < 4 in the D state, and > 2GB free memory.
That way I can have lots of processes with highly variable workloads
all doing their own thing and not stepping on each other.

(I can see both --memload and --min-free-mem being useful, the first
for "I need to reserve 4G system RAM for other stuff," the second for
"Each job needs 2G RAM."  The second is probably more common.)

On Thu, Mar 22, 2012 at 12:20 PM, Ole Tange <> wrote:
> On Thu, Mar 22, 2012 at 4:48 PM, Jay Hacker <> wrote:
>> Perhaps this is a bit simplistic, but what if you took your idea and
>> also kept a running estimate of the amount of load added by each job?
>> Start out assuming each job adds 1 unit of load, and then measure:
>> "Okay, I started 4 jobs last time, and the load went up by 8, so I
>> estimate each job causes 2 units of load."  Then when you sample the
>> difference and current load is say 12, with 16 procs, you'll only add
>> 2 jobs, and the load doesn't go over the max.
> That would only work on dedicated single user systems.
> My servers are (ab)used by 3-5 people at the same time.
> But I am warming up to the idea of ignoring load and instead just look
> at 'ps -A -o s'.
> 1: If number of 'R' == number of cpus: Do not start another.
> 2: If number of 'D' amongst (grand)children >= 1: Do not start another.
> 3: Else start a job more.
> CPU limited tasks will be limited by rule 1.
> Disk and NFS I/O limited tasks will be limited by rule 2.
> Net I/O will not be limited.
> I have not tested what will happen if the machine is swapping.
> /Ole

reply via email to

[Prev in Thread] Current Thread [Next in Thread]