fab-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Fab-user] [ANN] Fabric 1.0.4, 1.1.4, 1.2.2 released, & status updat


From: Ramon van Alteren
Subject: Re: [Fab-user] [ANN] Fabric 1.0.4, 1.1.4, 1.2.2 released, & status update
Date: Fri, 2 Sep 2011 11:31:51 +0200

Hi Jeff,

On Fri, Sep 2, 2011 at 01:04, Jeff Forcier <address@hidden> wrote:
Next up is testing and tweaking Morgan's multiprocessing/parallelism
work to get 1.3 out the door, which should be a nice boost to Fabric's
ability to operate in larger environments.

I'm very interested in this particular feature and would be interested in helping out getting it merged.

In addition to the parallel capabilities I'd also love to see changes related to easier library usage of fabric go in because (at least for me) they are pretty tightly coupled with the parallel feature.

Wrt the parallel behaviour of fabric I was curious about a few things:

== Failure == 

The current failure handling of fabric (halt on failure) does not map very well on parallelism. Even though a task might fail on a host and stop, all other hosts are busy executing the same task. In addition to that, the current code in Morgans/goosemo's branch does not (yet) allow for inspection of return values of parallel executed tasks.
Because of the queuesize limit there is also a possible scenario where execution of a task on a host fails in an early stage leaving the target hosts in an rather hard to recover (or interpret) state.

f.e. with 40 hosts and a queuesize of 10 suppose that execution of a task fails on host 13. If the code stops executing there and then, the user is left with a set of 20 hosts that did execute the tasks of which one or more hosts failed to execute successfully and another set of 20 hosts that did not execute the task at all.....

I would prefer a scenario where fabric would always execute a command marked for parallel execution on all hosts and report status afterwards. The execution of any subsequent commands could be halted or not depending on the value of warn_only.

I was wondering what your take on this was.

== Interactivity == 

Interactive behaviour (asking for password prompts etc.) has similar problems. It is very complex to ask users for input on parallel tasks. In most cases were users want to execute tasks in parallel interactivity is meaningless or unwanted. 
Interactivity would require a parallel executing process to reliably grab stdin and wait for input, which sorta defeats the concept of parallelism. Even if it would be possible, typing a response to 100+ password prompts is not my idea of a robust execution :) 

I would prefer to surpress all interaction in parallel execution mode and make the task fail instead. 

Adding logging support would be useful for this new functionality, but
I may put that out as a followup feature release, if parallel
execution works "well enough" without logging that folks would benefit
from having it released "early".

== Output == 

Output and consequently logging, which seems slated to replace the default print to sys.stdout/sys.stderr in fabric would also require some thought while executing in parallel. In the current situation when a task is executed in parallel the output of the task on the various hosts is interleaved, thus making it hard to interpret for users.

In addition any output of tasks is fairly hard to handle if you want/need/plan on using fabric as a library.

Switching to logging will not solve this problem, interleaved output will still happen in a situation where all output is pushed into a logging stream.

Changing to parallel execution might be a good time to rethink the entire output handling in fabric for tasks.
I noticed that return values from tasks are not captured in the current codebase (is this correct ?) which also makes it fairly hard to use fabric as a library.

== Background ="">

To give you some idea of what I'm (planning to use) using fabric for. 
I build maintenance and deploy scripting for a company with a large serverfarm of 3000+ nodes. Tasks build with fabric will typically execute on groups of hosts sized around 50 - 300 with a max of approx. 1500 hosts.

As you can imagine parallel execution if a required feature here and interactive command execution is a no-go area.
Our main use case for fabric is to use it as a library in scripts and orchestration services, deploying software would be a good example of scripting (and also were the current focus is).

I'm hoping to use fabric because the task model makes it really easy for an entire team of system engineers and developers to develop tasks in isolation of the complexity that comes with scale. The task developer can focus on the single machine scenario, which makes it relatively simple to develop a new task, while fabric does the heavy-lifting with regard to ssh connections, sudo, parallelism, output/return value capturing etc. 

Fabric is pretty awesome already, adding parallelism will make it really really awesome IMHO.

== Code ==

Although in a very embryonic state, I took the work that Morgan did on the parallel queue and converted it to a queue that is capable of capturing the return values of tasks executed in parallel. 

I added two methods to tasks.py to allow a library user to call execute in sequential and parallel fashion.

You can take a look at it here: https://github.com/ramonvanalteren/fabric/tree/multiprocessing-lib


Pff this turned into a far longer mail than originally intended.
I'd love to discuss this further on this list or over IRC, you can find me on #fabric as Ramonster.

Regards,

Ramon van Alteren

mail: address@hidden
IRC: Ramonster

reply via email to

[Prev in Thread] Current Thread [Next in Thread]