parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: parallel + blast + LSF


From: Giuseppe Aprea
Subject: Re: parallel + blast + LSF
Date: Fri, 8 May 2015 11:37:37 +0200

Hi and thanks for your attempts to make me understand the semaphore concept. Unfortunately I am not sure I understand. I need some examples with sem and parallel which stress the differences. From your example I learn "-j" sets the number of toilets; people needing a toilet can use any toilet, but if there are more people than toilets, they will have to wait for one to be free; anyhow that is also what GNU Parallel should do normally, or not?. If I have to launch only once sem or parallel what is the difference between:

sem --fg --no-notice -j 2 echo ::: 1 2
parallel --no-notice -j 2 echo ::: 1 2

?

As far as I can see in the first case there is no stdout:

$ sem --fg --no-notice -j 2 echo ::: 1 2

$ parallel --no-notice -j 2 echo ::: 1 2
1
2

Putting this piece of information together with the tutorial maybe I can make a step forward: If launched once only, "sem --fg" and "parallel" make almost the same thing. Things get different if I need to launch a second sem or parallel; in the first case, if the first sem is still running the second one would have to wait while the second parallel is executed straight. If that is the difference I wouldn't say what I did "doesn't make sense", put simply "is not necessary" meaning I could omit --semaphore/--wait

Let's now go to the execution problem. I tried to run your example with both ssh and blaunch.sh (the queue system spawn agent) with this script:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#!/bin/bash

module load 4.8.3/parallel/20150422

REM=`tail -n 1 ${LSB_DJOB_HOSTFILE}`
echo -e "--------------------------------------------------------------------------------------"
echo -e "--------------------------------------------------------------------------------------" 1>&2
echo -e "local=`hostname`"
echo -e "--------------------------------------------------------------------------------------"
echo -e "--------------------------------------------------------------------------------------" 1>&2
echo -e "remote=${REM}"
echo -e "--------------------------------------------------------------------------------------"
echo -e "--------------------------------------------------------------------------------------" 1>&2
ssh ${REM} echo 1
echo -e "--------------------------------------------------------------------------------------"
echo -e "--------------------------------------------------------------------------------------" 1>&2
parallel --no-notice -S ${REM}  echo ::: 1
echo -e "--------------------------------------------------------------------------------------"
echo -e "--------------------------------------------------------------------------------------" 1>&2
parallel --no-notice -vvS ${REM}  echo ::: 1
echo -e "--------------------------------------------------------------------------------------"
echo -e "--------------------------------------------------------------------------------------" 1>&2
blaunch.sh ${REM} echo 1
echo -e "--------------------------------------------------------------------------------------"
echo -e "--------------------------------------------------------------------------------------" 1>&2
parallel --no-notice -S "blaunch.sh ${REM}"  echo ::: 1
echo -e "--------------------------------------------------------------------------------------"
echo -e "--------------------------------------------------------------------------------------" 1>&2
parallel --no-notice -vvS "blaunch.sh ${REM}"  echo ::: 1
echo -e "--------------------------------------------------------------------------------------"
echo -e "--------------------------------------------------------------------------------------" 1>&2
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

this is stdout:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
--------------------------------------------------------------------------------------
local=cresco3x018.portici.enea.it
--------------------------------------------------------------------------------------
remote=cresco3x003.portici.enea.it
--------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------
ssh cresco3x003.portici.enea.it exec perl\ -e\ \\\$ENV\\\{\\\"PARALLEL_PID\\\"\\\}=\\\"23827\\\"\\\;\\\$ENV\\\{\\\"PARALLEL_SEQ\\\"\\\}=\\\"1\\\"\\\;\\\$bashfunc\\\ =\\\ \\\"\\\"\\\;@ARGV=\\\"echo\\\ 1\\\"\\\;\\\$shell=\\\"\\\$ENV\\\{SHELL\\\}\\\"\\\;\\\$SIG\\\{CHLD\\\}=
sub\\\{\\\$done=1\\\;\\\}\\\;\\\$pid=fork\\\;unless\\\(\\\$pid\\\)\\\{setpgrp\\\;exec\\\$shell,\\\"-c\\\",\\\(\\\$bashfunc.\\\"@ARGV\\\"\\\)\\\;die\\\"exec:\\\$\\\!\\\\n\\\"\\\;\\\}do\\\{\\\$s=\\\$s\\\<1\\\?0.001+\\\$s\\\*1.03:\\\$s\\\;select\\\(undef,undef,undef,\\\$s\\
\)\\\;\\\}until\\\(\\\$done\\\|\\\|getppid==1\\\)\\\;kill\\\(SIGHUP,-\\\$\\\{pid\\\}\\\)unless\\\$done\\\;wait\\\;exit\\\(\\\$\\\?\\\&127\\\?128+\\\(\\\$\\\?\\\&127\\\):1+\\\$\\\?\\\>\\\>8\\\);
--------------------------------------------------------------------------------------
1
--------------------------------------------------------------------------------------
1
--------------------------------------------------------------------------------------
blaunch.sh cresco3x003.portici.enea.it exec perl\ -e\ \\\$ENV\\\{\\\"PARALLEL_PID\\\"\\\}=\\\"23883\\\"\\\;\\\$ENV\\\{\\\"PARALLEL_SEQ\\\"\\\}=\\\"1\\\"\\\;\\\$bashfunc\\\ =\\\ \\\"\\\"\\\;@ARGV=\\\"echo\\\ 1\\\"\\\;\\\$shell=\\\"\\\$ENV\\\{SHELL\\\}\\\"\\\;\\\$SIG\\\{CH
LD\\\}=sub\\\{\\\$done=1\\\;\\\}\\\;\\\$pid=fork\\\;unless\\\(\\\$pid\\\)\\\{setpgrp\\\;exec\\\$shell,\\\"-c\\\",\\\(\\\$bashfunc.\\\"@ARGV\\\"\\\)\\\;die\\\"exec:\\\$\\\!\\\\n\\\"\\\;\\\}do\\\{\\\$s=\\\$s\\\<1\\\?0.001+\\\$s\\\*1.03:\\\$s\\\;select\\\(undef,undef,undef,
\\\$s\\\)\\\;\\\}until\\\(\\\$done\\\|\\\|getppid==1\\\)\\\;kill\\\(SIGHUP,-\\\$\\\{pid\\\}\\\)unless\\\$done\\\;wait\\\;exit\\\(\\\$\\\?\\\&127\\\?128+\\\(\\\$\\\?\\\&127\\\):1+\\\$\\\?\\\>\\\>8\\\);
1
--------------------------------------------------------------------------------------
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

and this is stderr:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
--------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------
Permission denied, please try again.
Permission denied, please try again.
Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
--------------------------------------------------------------------------------------
Permission denied, please try again.
Permission denied, please try again.
Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
parallel: Warning: Could not figure out number of cpus on cresco3x003.portici.enea.it (). Using 1.
Permission denied, please try again.
Permission denied, please try again.
Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
--------------------------------------------------------------------------------------
Permission denied, please try again.
Permission denied, please try again.
Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
parallel: Warning: Could not figure out number of cpus on cresco3x003.portici.enea.it (). Using 1.
Permission denied, please try again.
Permission denied, please try again.
Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
--------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

As you can see ssh is returning "Permission denied" error which sounds different from the previous one (0 simultaneous logins allowed). This problem is due to missing kerberos token under LSF (the queue system).
That is why I also used blaunch.sh, the queue system spawn agent which works fine in this case. However, in the blast case I have empty output files and several messages like this: "lsb_launch(): Failed while waiting for tasks to finish." on stderr. Here is some information about blaunch end the error message. That goes too far beyond my knowledge but sounds like lsf is not able to understand when a parallel task is finished.

Any clue?

cheers,

g



On Wed, May 6, 2015 at 11:59 PM, Ole Tange <address@hidden> wrote:
On Wed, May 6, 2015 at 1:50 PM, Giuseppe Aprea <address@hidden> wrote:

>  --wait/--semaphore/--pipe options: the --semaphore man page says:
>
> --semaphore
>
> Work as a counting semaphore. --semaphore will cause GNU parallel to start
> command in the background. When the number of simultaneous jobs is reached,
> GNU parallel will wait for one of these to complete before starting another
> command.
>
> --semaphore implies --bg unless --fg is specified.
>
> --semaphore implies --semaphorename `tty` unless --semaphorename is
> specified.
>
> Used with --fg, --wait, and --semaphorename.
>
> The command sem is an alias for parallel --semaphore.
>
> See also man sem.
>
>
> It may be my poor English but when I read that I understood that only using
> this option GNU parallel, once the maximum number of jobs (allowed by -j,
> for example) was reached, would have waited for one job to complete before
> starting another one. Now I have also read the Tutorial and I am confused
> (but it could also be a problem I have with background jobs and the symbol
> "&&"). According to what you say,
>
> - if I am running on a single host with "-j 10" and without --semaphore
> when 10 simultaneous jobs are reached, GNU parallel will wait for one to
> complete before starting another one because "that is what GNU Parallel does
> normally"
>
> - if I am running on a single host with "-j 10" and with --semaphore  when
> 10 simultaneous jobs are reached, GNU parallel will wait for one to complete
> before starting another one because that is what is written in the manpage

Yeah, I can see this is tricky if you have not been educated in what a
semaphore is in computer science. Maybe this explanation helps.

A counting semaphore (which 'sem' implements) is like a bunch of
toilets: People needing a toilet can use any toilet, but if there are
more people than toilets, they will have to wait for one of the
toilets to be available.

-j sets the number of toilets. Calling 'sem' is putting one person in
the queue for the toilets: If there is a toilet available, it starts
the job in the background and exits immediately. So 'sem' follows the
person to the toilet, but it does not go into the toilet with the
person.

If all toilets are taken, it waits until a toilet is free.

'sem --fg' starts the job in the foreground and only exits when the
job is done, so it stays with the person until the person leaves the
toilet.

So where 'parallel' usually will start more than one job, 'sem' only
starts a single job, and will often sit waiting before starting the
job.

A special type of semaphore is a mutex. That just a semaphore with a
single toilet. This is useful for a single shared resource, so that
two programs do not use this single shared resource at the same time.
'sem' defaults to '-j1'.

Is it now clear that 'sem' is not what you are looking for?

> and here is stderr:
>
> ------------------------------------------------------------
> parallel: Warning: ssh to cresco3x046.portici.enea.it only allows for 0
> simultaneous logins.
> You may raise this by changing /etc/ssh/sshd_config:MaxStartups and
> MaxSessions on cresco3x046.portici.enea.it.
> Using only -1 connections to avoid race conditions.

So GNU Parallel fails to run anything on the remote machine. Try a
simpler example and then debug that.

I would try:

  ssh cresco3x046.portici.enea.it echo 1

If that works:

  parallel -S cresco3x046.portici.enea.it echo ::: 1

If that fails:

  parallel -vvS cresco3x046.portici.enea.it echo ::: 1


/Ole


reply via email to

[Prev in Thread] Current Thread [Next in Thread]