parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: New behavior proposal --halt -% with job killing


From: Martin d'Anjou
Subject: Re: New behavior proposal --halt -% with job killing
Date: Mon, 27 Apr 2015 11:46:17 -0400

>
> Something like this:
>
> --halt now,fail,25% (aka --halt 2,25%)
> --halt soon,success,50% (aka --halt -1,50%)
> --halt soon,fail (aka --halt 1)
> --halt never (aka --halt 0)
> --halt soon,success,10 (I just need 10 to complete succcessfully)
>
> --halt when[,why][,pct][,num]
>
> when,why:
>   never,fail:
>   0:             Do not halt if a job fails. Exit status will be the
> number of jobs failed. This is the default.
>

Can this be expressed as positive logic? I do not fully understand the
consequences of "never,fail", plus the way I read it I hear "never
fail", so if a job fails, abort and start over, until success. Too
many interpretations are possible.

>
>   never,success:
>                   Do not halt if a job succeeds. Exit status will be
> the number of jobs succeeded.
>

In unix, an exit status of 0 mean pass, a non-zero means fail (and
gives a hint of what the failure is). You are proposing that the count
of success be the exit code. Weird.

>
>   soon,fail:
>   1:             Do not start new jobs if a job fails, but complete
> the running jobs including cleanup. The exit status will be the exit
> status from the last failing job.
>
>   soon,success:
>   -1:            Do not start new jobs if a job succeeds, but complete
> the running jobs including cleanup. The exit status will be the exit
> status from the last failing job (if any)
>
>   now,fail:
>   2:             Kill off all jobs immediately when a job fails and
> exit without cleanup. The exit status will be the exit status from the
> failing job.
>
>   now,success:
>   -2:             Kill off all jobs immediately when a job succeeds
> and exit without cleanup. The exit status will be the exit status from
> the last failing job (if any).
>
> 'when' defaults to never. 'why' defaults to fail.
>
> num:
>   >=3:         Do not look at a single job but at this number of jobs
> before halting.
>
> pct:
>   0%:           Only look at a single job. This is the default.
>   1-99%:      Do not look at a single job but at a percentage of all
> jobs before halting. At least 3 jobs will always be run.
>
> /Ole
>

I am not a big fan of "default" behaviors in the case of the --halt
option because there could be too many implicit default behaviors and
those are impossible to remember without reading the documentation.
I'd rather see this option as being as explicit as possible for
anything beyond one single default behavior.

Also, I would really really like to see a positive logic approach
where what happens is explicitly stated. Allow me to propose the
following.

We have:
--halt halt_argument_value

I will use the pseudo-BNF notation to specify the argument value.
Quoted string and number ranges are literals on the command line, they
can be pattern matched.

halt_argument_value: "never" | (("onsuccess" | "onfail") "," quantity
"," consequence)
quantity: number | pct
number: [1-9][0-9]*
pct:[1-9][0-9] "%"
consequence: "killpending" | "killall"

There are three cases to consider:

never
    Never halt, run everything to completion.

    The exit status is the number of failed jobs. The exist status
caps at 253, even if there are more than 253 failures.

    This is the default behaviour, as when --halt is not used.


onsuccess,N,consequence
    Halt after N jobs run sucessfully.

    The number N ranges from 1 to the total number of jobs, or a
percentage. If N is a percentage (N%), then at least 3 jobs have run
to completion before GNU Parallel considers making a decision. GNU
parallel will halt when the percentage of successful jobs will equal
or exceed N%.

    If the consequence is "killpending", GNU parallel will wait for
all running jobs to come to completion, but no more new jobs will be
dispatched (the pending jobs will be killed).

    If the consequence is "killall", GNU parallel will stop
dispatching jobs and will kill all the running jobs.

    If one or more job is successful, the exit status is 0. If no job
is successful, the exit status is the exit status of the last failing
job.


onfail,N,consequence
    Halt after N jobs have failed.

    The number N ranges from 1 to the total number of jobs, or a
percentage. If N is a percentage (N%), then at least 3 jobs have run
to completion before GNU Parallel considers making a decision. GNU
parallel will halt when the percentage of failing jobs will equal or
exceed N%.

    If the consequence is "killpending", GNU parallel will wait for
all running jobs to come to completion, but no more new jobs will be
dispatched (the pending jobs will be killed).

    If the consequence is "killall", GNU parallel will stop
dispatching jobs and will kill all the running jobs.

    If all the jobs are successful, the exit status is 0. If one of
more job has failed, the exit status is the exit status of the last
failing job. The exist status caps at 253, even if there are more than
253 failures.


Backward compatibility:

--halt 0 is the same as --halt never

--halt 1 is the same as --halt onfail,1,killpending

--halt 2 has no equivalents (it does not make sense to orphan jobs)

--halt -1 is the same as --halt onsuccess,1,killpending

--halt -2 has no equivalents (it does not make sense to orphan jobs)

--halt 1-99% is equivalent to --halt onfail,1-99%,killpending


Best regards,
Martin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]