guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Slurm with containers (i.e., orchestration)


From: Pjotr Prins
Subject: Re: Slurm with containers (i.e., orchestration)
Date: Mon, 18 May 2020 08:11:48 -0500
User-agent: NeoMutt/20170113 (1.7.2)

Ricardo added slurm-drmaa in the past (I can't believe it
almost 4 years ago we packaged slurm!) which may also help in
addressing some points

  http://www.drmaa.org/

Pj.


On Mon, May 18, 2020 at 07:49:00AM -0500, Pjotr Prins wrote:
> I am looking into some light-weight style orchestration. One
> possibility is to use Slurm with Guix containers - on a cluster with
> Guix that is almost trivial (we use Guix containers a lot! They are
> great) and would also allow non-container jobs.
> 
> Once we have containers and Slurm it should also be possible to deploy
> in some cloud infrastructure, provided there are no dependencies on
> the cluster itself. I think it would make a terrific BLOG story if we
> put something like that together. 
> 
> Bcbio describes an architecture that uses the common workflow language
> (CWL) to run pipelines with containers
> 
>   
> https://bcbio-nextgen.readthedocs.io/en/latest/contents/cwl.html#running-with-cromwell-local-hpc
> 
> I am not promoting the use of this, but it shows that infrastructure
> exists that can deploy workflows on containers in different setups
> (Bcbio supports Slurm). I know the Guix infrastructure uses Guix
> deploy to achieve similar roll-outs. What that lacks is the
> orchestration mechanism itself which should handle dependencies
> between jobs (i.e. a workflow). The GNU Workflow Language goes some
> way, but it does not handle orchestration itself.
> 
> In other words, we almost have the pieces, but one thing is missing
> :). Thoughts? I know I have brought this up before in different
> guises, but we start to really need something here.
> 
> What makes orchestration? I guess it concerns a dynamic database of
> machines that can execute jobs and some type of software registry
> (Guix).  Next it should be able to schedule and execute jobs using
> some constraint specifiers (like network/CPU/RAM). It could be a
> 'dynamic' Slurm that makes use of real machines and VMs. Or hook into
> an existing cloud service. A slurm job could monitor sending a
> container into a cloud service. 
> 
> I think we can build this up a step at a time. 
> 
> Thoughts?
> 
> Pj.
> 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]