help-gnubatch
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [help-gnubatch] How do I run jobs on a remote node


From: John M Collins
Subject: Re: [help-gnubatch] How do I run jobs on a remote node
Date: Fri, 25 Sep 2009 14:02:19 +0100

On Fri, 2009-09-25 at 14:48 +0200, Trygve Laugstøl wrote:
John M Collins wrote:
> On Thu, 2009-09-24 at 19:04 +0200, Trygve Laugstøl wrote:
>> John M Collins wrote:
>> > On Wed, 2009-09-23 at 23:18 +0200, Trygve Laugstøl wrote:
>> >> Hi
>> >>
>> >> I'm trying to package GNUBatch for OpenCSW [1]. I've created a package 
>> >> and installed GNUBatch quite successfully. I can start it, run jobs with 
>> >> gbch-r and they're executed. I even get an email about it every time.
>> >>
>> >> Now, the question is how do I get it to run the job on other nodes? I've 
>> >> been through the manuals but haven't been able to find much info on the 
>> >> subject. I can't find much info on how to create different queues, only 
>> >> how to write expressions to select them.
>> >>
>> >> [1]: http://opencsw.org
>> >>
>> >> --
>> >> Trygve
>> >>
>> > You firs need to have each other node set up so it sees "exported" jobs 
>> > and variables from its peers - you should be able to change job 
>> > parameters remotely etc.
>> > 
>> > You may have to run gbch-hostedit to set up other nodes' IP addresses 
>> > and stop/restart the scheduler.
>>
>> I think I've got the host file correctly configured. When I'm in 
>> gbch-hostedit I can see the (correct) IP of the host.
>>
>> However, I'm not sure how to verify the file and the connection to the 
>> other hosts.
>>
>> This is my setup:
>>
>> $ cat /opt/csw/etc/gnubatch.hosts
>> # Host file created on 24/09/09 at 18:59:19
>>
>> skybert-6       s6      probe,manual,trusted
>>
>>
>> When I'm trying to access a variable on a remote node (assuming this is 
>> the correct syntax) I'm getting:
>>
>> $ gbch-var skybert-6:CLOAD
>> gbch-var: Unknown variable skybert-6:CLOAD
>>
>> I did try to run "gbch-conn skybert-6" which seemed to work just fine 
>> after switching the connection type to manual.
> If there had been anything wrong with the hosts file it would have given 
> some error message at that point.
> 
>> Are there any log files I can look at?
> I think it has probably worked OK
> 
> You have got each machine with a hosts file entry pointing to the other 
> one haven't you?

Yep:

telestes:$ cat /opt/csw/etc/gnubatch.hosts
# Host file created on 24/09/09 at 18:59:19

skybert-6       s6      probe,manual,trusted

skybert-6:]$ cat /opt/csw/etc/gnubatch.hosts
# Host file created on 24/09/09 at 18:58:09

telestes        -       probe,manual,trusted

Now they're both in manual mode. I haven't seen any messages in the 
"btsched-reps" file

telestes:$ netstat -a|grep gnubatch
*.gnubatch                          Idle
*.gnubatch-netsrv                      Idle
*.gnubatch           *.*                0      0 49152      0 LISTEN
*.gnubatch-feeder       *.*                0      0 49152      0 LISTEN
*.gnubatch-netsrv       *.*                0      0 49152      0 LISTEN
*.gnubatch-api       *.*                0      0 49152      0 LISTEN


skybert-6:$ netstat -a|grep gnubatch
*.gnubatch                          Idle
*.gnubatch-netsrv                      Idle
*.gnubatch           *.*                0      0 49152      0 LISTEN
*.gnubatch-feeder       *.*                0      0 49152      0 LISTEN
*.gnubatch-netsrv       *.*                0      0 49152      0 LISTEN
*.gnubatch-api       *.*                0      0 49152      0 LISTEN

You may need to set up the "local address" in each "gnubatch.hosts" file to give the right IP address to use. You can do that with the "l" key in gbch-hostedit.

With "manual" on it won't attempt a connection until you tell it to with gbch-conn

> If you look in the "btsched-reps" file there will be messages if it 
> doesn't understand a connection attempt.
> 
> After you've run "gbch-conn" check for a connection on the gnubatch port 
> using "netstat -a|grep gnubatch".
> 
> You won't "see" the variables on the other machine until you've marked 
> them for export on the other machine with "gbch-var -E varname". The 
> same is true of jobs. (I had to make it like that as the network traffic 
> is too great especially when you have several hosts).

I tried this on telestes as the user "gnubatch":

$ gbch-var -C TRYGVE -s 123
gbch-var: Unknown variable TRYGVE
Is this the right syntax?
No the variable name should be last the "-C" just says create it if it doesn't exist.

gbch-var -C -s 123 TRYGVE

Also include "-E" if you want the other connected hosts to see it:

gbch-var -CE -s 123 TRYGVE

$ gbch-vlist
CLOAD     0                         Export # Current value of load level
LOADLEVEL 20000                            # Maximum value of load level
LOGJOBS                                    # File to save job record in
LOGVARS                                    # File to save variable record in
MACHINE   telestes-nge0.vs.inamo.no        # Name of current host
STARTLIM  15                               # Number of jobs to start at once
STARTWAIT 30                               # Wait time in seconds for 
job start

I would definitely make sure a "local address" is set as I've suggested above if there is a possibility that the "bare" machine name (without the domain) will just give 127.0.0.1 or something like that.

On skybert-6:
$ gbch-vlist -R
CLOAD                           #
LOADLEVEL                       #
LOGJOBS                         #
LOGVARS                         #
MACHINE   skybert-6.vs.inamo.no # Name of current host
STARTLIM                        #
STARTWAIT                       #

--
Trygve


John Collins address@hidden Skype: toadwarbler

xisoftware
Xi Software Ltd www.xisl.com 

Tel: +44 (0)1707 886110 (Direct) +44 (0)7799 113162 (Mobile)

Registered in England & Wales Company Number 1977148 VAT: GB 403 9239 64

Trading Address: 3 Mandeville Rise, Welwyn Garden City, Herts, AL8 7JT, UK
Reg Office: 2 Mill Road, Haverhill, Suffolk, CB9 8BD, UK


reply via email to

[Prev in Thread] Current Thread [Next in Thread]