guix-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug#61701] [PATCH] doc: Propose new cookbook section for reproducible r


From: kyle
Subject: [bug#61701] [PATCH] doc: Propose new cookbook section for reproducible research.
Date: Wed, 22 Feb 2023 05:17:29 +0000

From: Kyle Andrews <kyle@posteo.net>

The intent was to cover the most common cases where R and python using
researchers could rapidly achieve the benefits of reproducibility.
---
 doc/guix-cookbook.texi       | 174 +++++++++++++++++++++++++++++++++++
 guix/build-system/python.scm |   1 +
 2 files changed, 175 insertions(+)

diff --git a/doc/guix-cookbook.texi b/doc/guix-cookbook.texi
index b9fb916f4a..8a10bcbec7 100644
--- a/doc/guix-cookbook.texi
+++ b/doc/guix-cookbook.texi
@@ -114,6 +114,7 @@ Top
 
 Environment management
 
+* Reproducible Research in Practice:: Write manifests to create reproducible 
environments.
 * Guix environment via direnv:: Setup Guix environment with direnv
 
 Installing Guix on a Cluster
@@ -3538,9 +3539,182 @@ Environment management
 demonstrate such utilities.
 
 @menu
+* Reproducible Research in Practice:: Write manifests to create reproducible 
environments
 * Guix environment via direnv:: Setup Guix environment with direnv
 @end menu
 
+@node Reproducible Research in Practice
+@section Common scientific software environments
+
+Many researchers write applied scientific software supported by a
+mixture of more generic tools developed by teams written within the R
+and Python ecosystems and supporting shell utilities. Even researchers
+who predominantly stick to using just R or just python often have to use
+both R and python at the same time when collaborating with others.  This
+tutorial covers strategies for creating manifests to handle such
+situations.
+
+Widely used R packages are hosted on CRAN, which employs a strict test
+suite backed by continuous integration infrastructure for the latest R
+version. A positive result of this rigid discipline is that most R
+packages from the same period of time will interoperate well together
+when used with a particular R version. This means there is a clear
+low-complexity target for achieving a reproducible environment.
+
+Writing a manifest for packaging R code alone requires only minimal
+knowledge of the Guix infrastructure. This stub should work for most
+cases involving the R packages already in Guix.
+
+@example
+(use-modules
+ (gnu packages cran)
+ (gnu packages statistics))
+
+(packages->manifest
+ (list r r-tidyverse))
+
+R packages are defined predominantly inside of gnu/packages/cran.scm and
+gnu/packages/statistics.scm files under a guix source repository.
+
+This manifest can be run with the basic guix shell command:
+
+@example
+guix shell --manifest=manifest.scm --container
+@end example
+
+Please remember at the end to pin your channels so that others in the
+future know how to recover your exact Guix environment.
+
+@example
+guix describe --format=channels > channels.scm
+@end example
+
+This can be done with Guix time machine:
+
+@example
+guix time-machine --channels=channels.scm \
+  -- guix shell --manifest=manifest.scm --container
+@end example
+
+In contrast, the python scientific ecosystem is far less
+standardized. There is no effort made to integrate all python packages
+together. While there is a latest python version, it is less often less
+dominantly used for various reasons such as the fact that python tends
+to be employed with much larger teams than R is. This makes packaging up
+reproducible python environments much more difficult. Adding R together
+with python as a mixture complicates things still further. However, we
+have to be mindful of the goals of reproducible research.
+
+If reproducibility becomes an end in itself and not a catlyst towards
+faster discovery, then Guix will be a non-starter for scientists. Their
+goal is to develop useful understanding about particular aspects of the
+world.
+
+Thankfully, three common scenarios cover the vast majority of
+needs. These are:
+
+@itemize
+@item
+combining standard package definitions with custom package definitions
+@item
+combining package definitions from the current revision with other revisions
+@item
+combining package variants which need a modified build-system
+@end itemize
+
+In the rest of the tutorial we develop a manifest which tackles all
+three of these common issues. The hope is that if you see the hardest
+possible common situation as being readily solvable without writing
+thousands of lines of code, researchers will clearly see it as worth the
+effort which will not pose a significant detour from the main line of
+their research.
+
+@example
+(use-modules
+ (guix packages)
+ (guix download)
+ (guix licenses)
+ (guix profiles)
+ (gnu packages)
+ (gnu packages cran)
+ (guix inferior)
+ (guix channels)
+ (guix build-system python))
+
+;; guix import pypi APTED
+(define python-apted
+ (package
+  (name "python-apted")
+  (version "1.0.3")
+  (source (origin
+            (method url-fetch)
+            (uri (pypi-uri "apted" version))
+            (sha256
+             (base32
+              "1sawf6s5c64fgnliwy5w5yxliq2fc215m6alisl7yiflwa0m3ymy"))))
+  (build-system python-build-system)
+  (home-page "https://github.com/JoaoFelipe/apted";)
+  (synopsis "APTED algorithm for the Tree Edit Distance")
+  (description "APTED algorithm for the Tree Edit Distance")
+  (license expat)))
+
+(define last-guix-with-python-3.6
+ (list
+  (channel
+   (name 'guix)
+   (url "https://git.savannah.gnu.org/git/guix.git";)
+   (commit
+    "d66146073def03d1a3d61607bc6b77997284904b"))))
+
+(define connection-to-last-guix-with-python-3.6
+ (inferior-for-channels last-guix-with-python-3.6))
+
+(define first car)
+
+(define python-3.6
+ (first
+  (lookup-inferior-packages
+   connection-to-last-guix-with-python-3.6 "python")))
+
+(define python3.6-numpy
+ (first
+  (lookup-inferior-packages
+   connection-to-last-guix-with-python-3.6 "python-numpy")))
+
+(define included-packages
+ (list r r-reticulate))
+ 
+(define inferior-packages
+ (list python-3.6 python3.6-numpy))
+
+(define package-with-python-3.6
+ (package-with-explicit-python python-3.6
+  "python-" "python3.6-" 'python3-variant))
+ 
+(define custom-variant-packages
+ (list (package-with-python-3.6 python-apted)))
+
+(concatenate-manifest
+ (map packages->manifest
+  (list
+   included-packages
+   inferior-packages
+   custom-variant-packages)))
+@end example
+
+This should produce a profile with the latest R and an older python
+3.6. These should be able to interoperate with code like:
+
+@example
+library(reticulate)
+use_python("python")
+apted = import("apted")
+t1 = '{a{b}{c}}'
+t2 = '{a{b{d}}}'
+metric = apted$APTED(t1, t2)
+distance = metric$compute_edit_distance()
+@end example
+
 @node Guix environment via direnv
 @section Guix environment via direnv
 
diff --git a/guix/build-system/python.scm b/guix/build-system/python.scm
index c8f04b2298..d4aaab906d 100644
--- a/guix/build-system/python.scm
+++ b/guix/build-system/python.scm
@@ -36,6 +36,7 @@ (define-module (guix build-system python)
   #:use-module (srfi srfi-1)
   #:use-module (srfi srfi-26)
   #:export (%python-build-system-modules
+            package-with-explicit-python
             package-with-python2
             strip-python2-variant
             default-python
-- 
2.37.2






reply via email to

[Prev in Thread] Current Thread [Next in Thread]