From MAILER-DAEMON Mon Jul 06 05:52:26 2020 Received: from list by lists.gnu.org with archive (Exim 4.90_1) id 1jsNnW-0006CK-G8 for mharc-gwl-devel@gnu.org; Mon, 06 Jul 2020 05:52:26 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:51040) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jsNnT-0006A2-GR for gwl-devel@gnu.org; Mon, 06 Jul 2020 05:52:24 -0400 Received: from sender4-of-o51.zoho.com ([136.143.188.51]:21113) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.90_1) (envelope-from ) id 1jsNnP-0002Fw-JF for gwl-devel@gnu.org; Mon, 06 Jul 2020 05:52:23 -0400 ARC-Seal: i=1; a=rsa-sha256; t=1594029129; cv=none; d=zohomail.com; s=zohoarc; b=RkRqXO09ns+A58203TFb+UvFYh7af7wm5ueWGNWmnDqbp4nTWmKtlXV2MZun4NPOKdjwJ/XYfLRPMzUiIa2XBTgM161OB/fvxvC4E8f23yoeSb4GfxX7dLsNjrdmXghZeBl5f+7Mn2PlR2BWOd4UzuzgVgbxv0gKQuL79vpbzrg= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1594029129; h=Content-Type:Content-Transfer-Encoding:Date:From:MIME-Version:Message-ID:Subject:To; bh=Aq/KiqPkV6NMP6vPzSlZYvsDg+iu12fVg+wqVxc+qAU=; b=dRxnWJzIfidpnKrjfLvzVf+aT8dqjLfVTrAUQBe0ZGxdXuDDB1DQb/OKBoYDra/t7CfNoLyK7DehhRJK7h67CtXRwbwUZxgXwNfSSJMesUpBe0cnlUQE1yD22zGL4KpUCG/14qFF8WtfFl0/xJu16Z0Kl0MmZgh76x4ApZJWdfI= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass header.i=elephly.net; spf=pass smtp.mailfrom=rekado@elephly.net; dmarc=pass header.from= header.from= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; t=1594029129; s=zoho; d=elephly.net; i=rekado@elephly.net; h=From:To:Subject:Date:Message-ID:MIME-Version:Content-Type:Content-Transfer-Encoding; bh=Aq/KiqPkV6NMP6vPzSlZYvsDg+iu12fVg+wqVxc+qAU=; b=UdgVr5+rtw0joAr+wOQyk3e7R0f7VDprkZqrXSMnR9iYzaOFR1eEr/aKZhwelo+j 58OrYFycurWGVXebbG4nZcgOvIvK3GNeGLydb+hBRS6XnD+13i400pbx+XXgvo7OjgG B/zyVL8MjQHlVXFSOZ1oHwGwNnMtr/xKOxgzQSY0= Received: from localhost (p54ad4de6.dip0.t-ipconnect.de [84.173.77.230]) by mx.zohomail.com with SMTPS id 1594029127907573.6484987424377; Mon, 6 Jul 2020 02:52:07 -0700 (PDT) User-agent: mu4e 1.4.10; emacs 26.3 From: Ricardo Wurmus To: gwl-devel@gnu.org Subject: fastest way to run a GWL workflow on AWS X-URL: https://elephly.net X-PGP-Key: https://elephly.net/rekado.pubkey X-PGP-Fingerprint: BCA6 89B6 3655 3801 C3C6 2150 197A 5888 235F ACAC Date: Mon, 06 Jul 2020 11:52:04 +0200 Message-ID: <87a70dkm2j.fsf@elephly.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-ZohoMailClient: External Received-SPF: pass client-ip=136.143.188.51; envelope-from=rekado@elephly.net; helo=sender4-of-o51.zoho.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/07/06 05:52:11 X-ACL-Warn: Detected OS = Linux 3.11 and newer [fuzzy] X-Spam_score_int: -30 X-Spam_score: -3.1 X-Spam_bar: --- X-Spam_report: (-3.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001 autolearn=_AUTOLEARN X-Spam_action: no action X-BeenThere: gwl-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Jul 2020 09:52:24 -0000 Hey there, I had an idea to get a GWL workflow to run on AWS without having to mess with Docker and all that. GWL should do all of these steps when AWS deployment is requested: * create an EFS file system. Why EFS? Unlike EBS (block storage) and S3, one EFS can be accessed simultaneously by different virtual machines (EC2 instances). * sync the closure of the complete workflow (all steps) to EFS. (How? We could either mount EFS locally or use an EC2 instance as a simple =E2=80=9Ccloud=E2=80=9D file server.) This differs from how other workflo= w languages handle things. Other workflow systems have one or more Docker image(s) per step (sometimes one Docker image per application), which means that there is some duplication and setup time as Docker images are downloaded from a registry (where they have previously been uploaded). Since Guix knows the closure of all programs in the workflow we can simply upload all of it. * create as many EC2 instances as requested (respecting optional grouping information to keep any set of processes on the same node) and mount the EFS over NFS. The OS on the EC2 instances doesn=E2=80=99t matter. * run the processes on the EC2 instances (parallelizing as far as possible) and have them write to a unique directory on the shared EFS. The rest of the EFS is used as a read-only store to access all the Guix-built tools. The EFS either stays active or its contents are archived to S3 upon completion to reduce storage costs. The last two steps are obviously a little vague; we=E2=80=99d need to add a= few knobs to allow users to easily tweak resource allocation beyond what the GWL currently offers (e.g. grouping, mapping resources to EC2 machine sizes.) To implement the last step we would need to keep track of step execution. We can already do this, but the complication here is to effect execution on the remote nodes. I also want to add optional reporting for each step. There could be a service that listens to events and each step would trigger events to indicate start and stop of each step. This could trivially be visualized, so that users can keep track of the state of the workflow and its processes, e.g. with a pretty web interface. For the deployment to AWS (and eventual tear-down) we can use Guile AWS. None of this depends on =E2=80=9Cguix deploy=E2=80=9D, which I think would = be a poor fit as these virtual machines are meant to be disposable. Another thing I=E2=80=99d like to point out is that this doesn=E2=80=99t le= ad users down the AWS rabbit hole. We don=E2=80=99t use specialized AWS services like th= eir cluster/grid service, nor do we use Docker, nor ECS, etc. We use the simplest resource types: plain EC2 and boring NFS storage. This looks like one of the simplest remote execution models, which could just as well be used with other remote compute providers (or even a custom server farm). One of the open issues is to figure out how to sync the /gnu/store items to EFS efficiently. I don=E2=80=99t really want to shell out to rsync, nor= do I want to use =E2=80=9Cguix copy=E2=80=9D, which would require a remote insta= llation of Guix. Perhaps rsync would be the easiest route for a rough first draft. It would also be nice if we could deduplicate our slice of the store to cut down on unnecessary traffic to AWS. What do you think about this? --=20 Ricardo From MAILER-DAEMON Wed Jul 15 20:08:49 2020 Received: from list by lists.gnu.org with archive (Exim 4.90_1) id 1jvrSD-0003Vv-Pl for mharc-gwl-devel@gnu.org; Wed, 15 Jul 2020 20:08:49 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:54794) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jvrSB-0003Vd-RC for gwl-devel@gnu.org; Wed, 15 Jul 2020 20:08:47 -0400 Received: from mail-wm1-x336.google.com ([2a00:1450:4864:20::336]:37073) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1jvrS9-00048v-Ui for gwl-devel@gnu.org; Wed, 15 Jul 2020 20:08:47 -0400 Received: by mail-wm1-x336.google.com with SMTP id o2so8083606wmh.2 for ; Wed, 15 Jul 2020 17:08:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:in-reply-to:references:date:message-id:mime-version :content-transfer-encoding; bh=l/+lOPHw+S0kqaFrdAER5sA+h/f2tKTY6NygtlTpJXU=; b=Yv6vOXyA5UowyuavG3WvkbdjdO+JjR1fAJIUlhg6L7cMOd9BTMuN4wT2/nm1M7htjz R3NPWeB1fTTezKClujhmu1CRqOiHBkIavMDkw+g+V3Apr3siqLC/F8q0JUxNNYyPaHax Y0sG1+SQlem1Fx2ofdIaS2FQILWx6KbNozgYJqvQYrXkOmq91TUMLTLE6bbHzXrBPk62 XDtN0mAe3Y611b5X5i1CoRk7z+NbDAVwRcvkU5IsWR/Y9YBqhugDLqFr3jGM0vjHuC1l 4PTZ4olYPyyaCCQ9Zx5HoHB7c3Cg2ICdHhInUXnpihVJ+ZeuIQLkiJ1MtzJ0RmaraUnB D4VA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:in-reply-to:references:date :message-id:mime-version:content-transfer-encoding; bh=l/+lOPHw+S0kqaFrdAER5sA+h/f2tKTY6NygtlTpJXU=; b=cRlOI1L2ufprByj/m53qGLYw0SLrNlUM5uxrJZ/QjHzfN+ZsbsdN1rIinJNv+lWlzZ l7+wymwiI1oZElkknyQMKZH7FTA19ypXD6dYBn3RFWWmQyJSB8wjSBFoLGICBW3dnFZl +2F7PyJeCG6UgqTOWEeM6n7AXYPEz9n5p9nI3UvrL3kukzSgwNGzp6p6xd7ccHeCWJtW Ti4lC1E5EnDl+ACHntOxsqoOG19bVg1ICMSFLMICQREAeLk4QidBEyeZ5VTgoVPz/2kD XmdZUChiLmWLnv7t7wVwSih7sVaUGbOWDTEIedYcpQONDV+IgidaToUIhjyeCZEj5uKD kdqQ== X-Gm-Message-State: AOAM533Mum93QrAI5oEV3iw9uMerpLJs3ieOqtvYMPxJY/WuQp5MMS0D knlNnGmz194JbVNk3Dib7BNkxnPM X-Google-Smtp-Source: ABdhPJyYtA3C+4EVJWQMh5DajunRh7XqkWlIllRXspCWDQCKTcqRQ3431Vd/aSfHS0IzTzvgnFXG9A== X-Received: by 2002:a7b:c0c9:: with SMTP id s9mr1777717wmh.166.1594858123659; Wed, 15 Jul 2020 17:08:43 -0700 (PDT) Received: from lili ([2a01:e0a:59b:9120:65d2:2476:f637:db1e]) by smtp.gmail.com with ESMTPSA id g3sm6566126wrb.59.2020.07.15.17.08.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 15 Jul 2020 17:08:42 -0700 (PDT) From: zimoun To: Ricardo Wurmus , gwl-devel@gnu.org Subject: Re: fastest way to run a GWL workflow on AWS In-Reply-To: <87a70dkm2j.fsf@elephly.net> References: <87a70dkm2j.fsf@elephly.net> Date: Thu, 16 Jul 2020 02:08:40 +0200 Message-ID: <86y2nkz5h3.fsf@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=2a00:1450:4864:20::336; envelope-from=zimon.toutoune@gmail.com; helo=mail-wm1-x336.google.com X-detected-operating-system: by eggs.gnu.org: No matching host in p0f cache. That's all we know. X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: gwl-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Jul 2020 00:08:48 -0000 Dear Ricardo, Nice ideas! I am a bit ignorant in this area so my questions are surely totally naive, not to say dumb. :-) On Mon, 06 Jul 2020 at 11:52, Ricardo Wurmus wrote: > * create an EFS file system. Why EFS? Unlike EBS (block storage) and > S3, one EFS can be accessed simultaneously by different virtual > machines (EC2 instances). Who creates the EFS file system? And you are referring to [1], right? 1: https://aws.amazon.com/efs/ > * sync the closure of the complete workflow (all steps) to EFS. (How? > We could either mount EFS locally or use an EC2 instance as a simple > =E2=80=9Ccloud=E2=80=9D file server.) This differs from how other workf= low languages > handle things. Other workflow systems have one or more Docker > image(s) per step (sometimes one Docker image per application), which > means that there is some duplication and setup time as Docker images > are downloaded from a registry (where they have previously been > uploaded). Since Guix knows the closure of all programs in the > workflow we can simply upload all of it. I think one of the points about using one Docker image per step to ease the composition, well to be able to recompose another workflow with some of the steps and other steps requiring other tools with other versions. In Guix parlance, workflow1 uses tool1 for step1 and tool2 for step2 both from commit C1. If workflow2 uses tool1 from commit C1 for step1' and tool3 from commit C2 for step2', then it is easy if each tool (step) are containered and not in only one big image. But it is an issue for the Guix side, not the GWL side. :-) For example, is it possible to compose 2 profiles owning one package at the very same version but grafted differently? > * create as many EC2 instances as requested (respecting optional > grouping information to keep any set of processes on the same node) > and mount the EFS over NFS. The OS on the EC2 instances doesn=E2=80=99t > matter. By =E2=80=9CThe OS on the EC2 instances doesn=E2=80=99t matter.=E2=80=9C, d= o you mean that it is possible to run Guix System or Guix as package package on the top of say Debian? > * run the processes on the EC2 instances (parallelizing as far as > possible) and have them write to a unique directory on the shared > EFS. The rest of the EFS is used as a read-only store to access all > the Guix-built tools. > > The EFS either stays active or its contents are archived to S3 upon > completion to reduce storage costs. > > The last two steps are obviously a little vague; we=E2=80=99d need to add= a few > knobs to allow users to easily tweak resource allocation beyond what the > GWL currently offers (e.g. grouping, mapping resources to EC2 machine > sizes.) To implement the last step we would need to keep track of step > execution. We can already do this, but the complication here is to > effect execution on the remote nodes. Ok. > I also want to add optional reporting for each step. There could be a > service that listens to events and each step would trigger events to > indicate start and stop of each step. This could trivially be > visualized, so that users can keep track of the state of the workflow > and its processes, e.g. with a pretty web interface. By =E2=80=9Cservice=E2=80=9D, do you mean as Guix services? > For the deployment to AWS (and eventual tear-down) we can use Guile AWS. > > None of this depends on =E2=80=9Cguix deploy=E2=80=9D, which I think woul= d be a poor fit > as these virtual machines are meant to be disposable. > > Another thing I=E2=80=99d like to point out is that this doesn=E2=80=99t = lead users down > the AWS rabbit hole. We don=E2=80=99t use specialized AWS services like = their > cluster/grid service, nor do we use Docker, nor ECS, etc. We use the > simplest resource types: plain EC2 and boring NFS storage. This looks > like one of the simplest remote execution models, which could just as > well be used with other remote compute providers (or even a custom > server farm). > > One of the open issues is to figure out how to sync the /gnu/store items > to EFS efficiently. I don=E2=80=99t really want to shell out to rsync, n= or do I > want to use =E2=80=9Cguix copy=E2=80=9D, which would require a remote ins= tallation of > Guix. Perhaps rsync would be the easiest route for a rough first > draft. It would also be nice if we could deduplicate our slice of the > store to cut down on unnecessary traffic to AWS. Naively, why does the =E2=80=9Cguix pack -f docker=E2=80=9D or =E2=80=9Cgui= x docker-image=E2=80=9D approach fail? All the best, simon From MAILER-DAEMON Thu Jul 16 11:18:02 2020 Received: from list by lists.gnu.org with archive (Exim 4.90_1) id 1jw5e6-0008RI-Jf for mharc-gwl-devel@gnu.org; Thu, 16 Jul 2020 11:18:02 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:59864) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jw5e4-0008R5-Be for gwl-devel@gnu.org; Thu, 16 Jul 2020 11:18:01 -0400 Received: from sender4-of-o51.zoho.com ([136.143.188.51]:21158) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.90_1) (envelope-from ) id 1jw5e1-00015K-Pp for gwl-devel@gnu.org; Thu, 16 Jul 2020 11:18:00 -0400 ARC-Seal: i=1; a=rsa-sha256; t=1594912672; cv=none; d=zohomail.com; s=zohoarc; b=CoRYKT/8qmsbksCtZhxPqRu/+h9cm2Fa/1ieTBVABgwmc5wKjQ8jwM7YjT2TIfz/0jmUaNgeDMbLfqg6h5TmjUxwj3ZWq7hsG0AqpVxKH3+/nSiPD4tEANQuJTtc9y+n8MmPmrOhSJ8PhN08FiGRxzSKGkhOWwD4Tz28xFsmWME= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1594912672; h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:MIME-Version:Message-ID:References:Subject:To; bh=7IStp4BqZUhQQbNdHa3JqKPu+3J4PA5/rdWq8Yzr9O0=; b=ayD0b0W+/2opwohO9/4VxuSIG4q2kApda4yFs4oqlPI+TYgMZmGE9HQyOS4hXHyJXjA65tvX1+KUXfe4HbGGcDa6q+EjgWDK0KVg34OzdvWKNympiNZ7fCU97YAoPTBkTvtW590qGYfu6F7ewKHZcE+AztqWaTN1rWX6FQxh+C0= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass header.i=elephly.net; spf=pass smtp.mailfrom=rekado@elephly.net; dmarc=pass header.from= header.from= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; t=1594912672; s=zoho; d=elephly.net; i=rekado@elephly.net; h=References:From:To:Cc:Subject:In-reply-to:Date:Message-ID:MIME-Version:Content-Type:Content-Transfer-Encoding; bh=7IStp4BqZUhQQbNdHa3JqKPu+3J4PA5/rdWq8Yzr9O0=; b=TDr8GqJoNbdNdDH0CUpc/Y4tnn7+0y196G+1LRjluHa+JYY4roK1VZMRQH6Fp47S FSmJFEiXeYac/6urbhyiOsrzDct2ZLZTZRGUJV/02AfzH/1HhWaEBQMJYfjabPUKLiv Xq6VqGj4sbepvxHEpvNOHw6+o7qZfEBd5O/s59Co= Received: from localhost (p54ad4ed3.dip0.t-ipconnect.de [84.173.78.211]) by mx.zohomail.com with SMTPS id 1594912666959388.4537234386902; Thu, 16 Jul 2020 08:17:46 -0700 (PDT) References: <87a70dkm2j.fsf@elephly.net> <86y2nkz5h3.fsf@gmail.com> User-agent: mu4e 1.4.10; emacs 26.3 From: Ricardo Wurmus To: zimoun Cc: gwl-devel@gnu.org Subject: Re: fastest way to run a GWL workflow on AWS In-reply-to: <86y2nkz5h3.fsf@gmail.com> X-URL: https://elephly.net X-PGP-Key: https://elephly.net/rekado.pubkey X-PGP-Fingerprint: BCA6 89B6 3655 3801 C3C6 2150 197A 5888 235F ACAC Date: Thu, 16 Jul 2020 17:17:43 +0200 Message-ID: <87tuy7fq08.fsf@elephly.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-ZohoMailClient: External Received-SPF: pass client-ip=136.143.188.51; envelope-from=rekado@elephly.net; helo=sender4-of-o51.zoho.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/07/16 11:17:54 X-ACL-Warn: Detected OS = Linux 3.11 and newer [fuzzy] X-Spam_score_int: -30 X-Spam_score: -3.1 X-Spam_bar: --- X-Spam_report: (-3.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: gwl-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Jul 2020 15:18:01 -0000 zimoun writes: >> * create an EFS file system. Why EFS? Unlike EBS (block storage) and >> S3, one EFS can be accessed simultaneously by different virtual >> machines (EC2 instances). > > Who creates the EFS file system? And you are referring to [1], right? > > 1: https://aws.amazon.com/efs/ Guile AWS would create it on demand (unless a user provides the name of an existing EFS that already contains a few Guix things). The idea is to copy parts of a store to a remote file system =E2=80=94 just without the database and Guix itself doing anything on the remote. This is very much like the setup of Guix on HPC clusters where all nodes mount the shared file system that is controlled by one node. In the case of EFS the =E2=80=9Ccontroller node=E2=80=9D is the user=E2=80=99s machine running= GWL. >> * sync the closure of the complete workflow (all steps) to EFS. (How? >> We could either mount EFS locally or use an EC2 instance as a simple >> =E2=80=9Ccloud=E2=80=9D file server.) This differs from how other work= flow languages >> handle things. Other workflow systems have one or more Docker >> image(s) per step (sometimes one Docker image per application), which >> means that there is some duplication and setup time as Docker images >> are downloaded from a registry (where they have previously been >> uploaded). Since Guix knows the closure of all programs in the >> workflow we can simply upload all of it. > > I think one of the points about using one Docker image per step to ease > the composition, well to be able to recompose another workflow with some > of the steps and other steps requiring other tools with other versions. > > In Guix parlance, workflow1 uses tool1 for step1 and tool2 for step2 > both from commit C1. If workflow2 uses tool1 from commit C1 for step1' > and tool3 from commit C2 for step2', then it is easy if each tool (step) > are containered and not in only one big image. > > But it is an issue for the Guix side, not the GWL side. :-) For > example, is it possible to compose 2 profiles owning one package at the > very same version but grafted differently? I think it *is* a GWL issue to solve. The GWL could support inferiors so that users could reference specific tool variants for parts of the workflow. Currently, the GWL will use whatever tools the extended version of Guix provides. >> * create as many EC2 instances as requested (respecting optional >> grouping information to keep any set of processes on the same node) >> and mount the EFS over NFS. The OS on the EC2 instances doesn=E2=80= =99t >> matter. > > By =E2=80=9CThe OS on the EC2 instances doesn=E2=80=99t matter.=E2=80=9C,= do you mean that it is > possible to run Guix System or Guix as package package on the top of say > Debian? Running Guix System on AWS is tricky. AWS doesn=E2=80=99t like our disk im= ages because /etc/fstab doesn=E2=80=99t exist (that was the last error before I stopped playing with it). My point is that Guix System isn=E2=80=99t necessary. Pick whatever virtual machine image they offer on AWS and mount the EFS containing all the Guix goodies. >> I also want to add optional reporting for each step. There could be a >> service that listens to events and each step would trigger events to >> indicate start and stop of each step. This could trivially be >> visualized, so that users can keep track of the state of the workflow >> and its processes, e.g. with a pretty web interface. > > By =E2=80=9Cservice=E2=80=9D, do you mean as Guix services? No, much more vague. When you submit a GWL workflow to a cluster today the GWL prepares things and then hands off the work to the cluster scheduler. The GWL has no way to tell you anything about the progress of the workflow. Its work is done once it has compiled a higher-order description of the workflow down to scripts that the cluster can run. It doesn=E2=80=99t have to be this way. Why let the cluster scheduler have= all the fun? (And more importantly: what do we do if we don=E2=80=99t *have* a scheduler?) The GWL could have a sub-command or switch to watch submitted jobs, a little daemon that listens to events being sent by the individual steps of the workflow; events like =E2=80=9Cstarted=E2=80=9D, = =E2=80=9Cerror=E2=80=9D, =E2=80=9Cdone=E2=80=9D; even fancier ones such as machine load or disk util= ization at this point in time. When enabled the jobs themselves would be instrumented and sending information to the GWL monitor, which in turn would be able to visualize this information. >> One of the open issues is to figure out how to sync the /gnu/store items >> to EFS efficiently. I don=E2=80=99t really want to shell out to rsync, = nor do I >> want to use =E2=80=9Cguix copy=E2=80=9D, which would require a remote in= stallation of >> Guix. Perhaps rsync would be the easiest route for a rough first >> draft. It would also be nice if we could deduplicate our slice of the >> store to cut down on unnecessary traffic to AWS. > > Naively, why does the =E2=80=9Cguix pack -f docker=E2=80=9D or =E2=80=9Cg= uix docker-image=E2=80=9D > approach fail? Docker images would have to be uploaded to a container registry (either DockerHub or Amazon=E2=80=99s ECR). AWS can use Docker only by downloading= an image from a registry when you instantiate a virtual machine. One of the advantages of using Guix is that we don=E2=80=99t need to use a big Doc= ker blob at all; we can instead upload individual store items (and accumulate them) and use them directly without the need for any copying from a container registry. --=20 Ricardo