Re: [gwl-devel] Next steps for the GWL

From: Ricardo Wurmus
Subject: Re: [gwl-devel] Next steps for the GWL
Date: Thu, 6 Jun 2019 12:11:08 +0200
Date: Thu, 6 Jun 2019 12:11:08 +0200

Hi Kyle,

thanks for your comments!

> One of the things I'd love to do
> with GWL is to make it play well with git-annex, something that would
> almost certainly be too specific for GWL itself.  For example
>   * Make data caching git-annex aware.  When deciding to recompute data
>     files, GWL avoids computing the hash of data files, using scripts as
>     the cheaper proxy, as you described in address@hidden
>     But if the user is tracking data files with git-annex, getting the
>     hash of data files becomes less expensive because we can ask
>     git-annex for the hash it has already computed.
>   * Support getting annex data files on demand (i.e. 'git annex get') if
>     they are needed as inputs.

I wonder what the protocol should look like.  Should a workflow
explicitly request a “git annex” file or should it be up to the person
running the workflow, i.e. when “git annex” has been configured to be
the cache backend it would simply look up the declared input/output
files there.

I suppose the answers would equally apply to using IPFS as a cache.

>> * add support for executing processes in isolated environments
>>   (containers) — this requires a better understanding of process inputs.
> This is another one I'm especially excited about.  Functionality-wise,
> are you imagining essentially matching the options available for 'guix
> environment --container ...'?

So far this is all I’ve got:

--8<---------------cut here---------------start------------->8---
diff --git a/gwl/processes.scm b/gwl/processes.scm
index beb61cc..264807f 100644
--- a/gwl/processes.scm
+++ b/gwl/processes.scm
@@ -19,13 +19,19 @@
   #:use-module ((guix derivations)
                 #:select (derivation->output-path
+  #:use-module ((guix packages)
+                #:select (package-file))
   #:use-module (guix gexp)
-  #:use-module ((guix monads) #:select (mlet return))
+  #:use-module ((guix monads) #:select (mlet mapm return))
   #:use-module (guix records)
   #:use-module ((guix store)
                 #:select (run-with-store
+  #:use-module ((guix modules)
+                #:select (source-module-closure))
+  #:use-module (gnu system file-systems)
+  #:use-module (gnu build linux-container)
   #:use-module (ice-9 format)
   #:use-module (ice-9 match)
   #:use-module (srfi srfi-1)
@@ -276,6 +282,54 @@ plain S-expression."
        (call process code)))
     (whatever (error (format #f "unsupported procedure: ~a\n" whatever)))))

+;; WIP
+(define (containerize exp process)
+  "Wrap EXP, an S-expression or G-expression, in a G-expression that
+causes EXP to be run in a container according to the requirements
+specified in PROCESS."
+  (let* ((package-dirs
+          (with-store store
+            (run-with-store store
+              (mapm %store-monad package-file
+                    (process-package-inputs process)))))
+         (data-inputs
+          (process-data-inputs process))
+         (output-dirs
+          (delete-duplicates
+           (map dirname (process-outputs process))))
+         (input-mappings
+          (map (lambda (location)
+                 (file-system-mapping
+                  (source location)
+                  (target location)
+                  (writable? #f)))
+               (lset-difference string=?
+                                (append package-dirs
+                                        data-inputs)
+                                output-dirs)))
+         (output-mappings
+          (map (lambda (dir)
+                 (file-system-mapping
+                  (source dir)
+                  (target dir)
+                  (writable? #t)))
+               output-dirs))
+         (specs
+          (map (compose file-system->spec
+                        file-system-mapping->bind-mount)
+               (append input-mappings
+                       output-mappings))))
+    (with-imported-modules (source-module-closure
+                            '((gnu build linux-container)
+                              (gnu system file-systems)))
+      #~(begin
+          (use-modules (gnu build linux-container)
+                       (gnu system file-systems))
+          (call-with-container (append %container-file-systems
+                                       (map spec->file-system
+                                            '#$specs))
+            (lambda () #$exp))))))
 ;;; ---------------------------------------------------------------------------
 ;;; ---------------------------------------------------------------------------
--8<---------------cut here---------------end--------------->8---

This means that it can map file systems into the container and then run
the process expression in that environment.

One thing I’m not happy about is that I can only mount directories, and
not individual files that have been declared as inputs.  I’d like to
have more fine-grained access.  I suppose it might be possible to mount
just the relevant parts of the GWL cache, but I need to play with this
to better understand what the desired behaviour would be.


