[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Dynamically changing remote servers list
From: |
Ole Tange |
Subject: |
Re: Dynamically changing remote servers list |
Date: |
Sun, 17 Aug 2014 12:29:40 +0200 |
On Sat, Aug 16, 2014 at 8:10 PM, Douglas A. Augusto <daaugusto@gmail.com> wrote:
> If the ability to dynamically include/exclude servers is implemented (for
> instance by re-reading a file containing the list of servers) then the user
> could take care of maintaining a list of active servers by doing something
> like (just to get the idea):
>
> while true; do parallel -k 'if ssh {} /bin/true; then echo "{}"; fi' :::
> host1 host2 ... hostN > active_hosts.slf; sleep 10; done
So you are basically suggesting a daemon that keeps the slf updated.
Daemon:
forever {
nice parallel --nonall -j0 -k --slf original.slf --tag echo | remove
final tab > tmp.slf
if diff tmp.slf original.slf:
mv tmp.slf tmp2.slf
sleep 10
}
Parallel:
sub init {
cp original.slf tmp2.slf
start daemon
}
if tmp2.slf changed:
@new = grep { not $existing{$_} } @slf
@back = grep { $existing{$_} and $existing{$_}->jobslots == 0 } @slf
@removed = grep { not in @slf } keys %existing
for @new: add_host
for @back: reset_jobslots
for @removed: remove_host
sub add_host {
do as normal
}
sub reset_jobslots {
jobslots = original_jobslots
}
sub remove_host {
set jobslots = 0
}
sub cleanup {
kill daemon
rm tmp.slf tmp2.slf
}
It is starting to look more and more doable.
> Of course, the jobs that were sent to the unavailable servers before they were
> detected as down will still fail. But in this case I think it is okay to
> re-run
> GNU Parallel with --resume-failed.
Or the user should use --retries which actively selects a server on
which the job has failed the least number of times.
/Ole