[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Google Summer of Code 2023 Inquiry

From: Kyle
Subject: Re: Google Summer of Code 2023 Inquiry
Date: Tue, 04 Apr 2023 06:29:00 +0000

Hi Spencer,

Here is the documentation for the git commit-graph cache file. The authors also made their own blog posts about it as well with a bit more explanation.


Maybe it won't turn out to be needed... just thought it might help get you thinking. Please read all my suggestions from that perspective as a reasonable default.

I will have to defer to others for gauging the size of projects. I have found as a rule there are always many more details to be considered than I could have anticipated at the start of a project. That said I liked your earlier stated plan of starting simple. Handling latest releases seems a reasonable minimal viable product.


On April 3, 2023 8:41:53 PM EDT, Spencer Skylar Chan <> wrote:
Hi Kyle,

On 3/31/23 11:15, Kyle wrote:
I would expect most software versions to not be in Guix. Simon had mentioned that this is mostly what the guix-past repository is for. However, some packages might be buried on some branch or some commit in some Guix related git repository. It may be helpful to facilitate their discovery and extraction for conda import.

Git has a newish binary file format for caching searches across commits. Maybe it would be helpful to figure out how to parse this format (its documented) and index the data further using Xapian or a graph data structure (or tree sitter?) with the relevant metadata needed to find and efficiently extract scheme code and its dependencies?

If the format is documented then this is possible, although I'm not super familiar with these kinds of data structures.

You make an interesting point about compilation errors. It may more productive to help researchers test for working satisfiable configurations as a more relaxed approach to having to specify the exact software version. Maybe some "nearby" or newer version is packaged and that is enough to successfully run a test suite? I'm imagining something between git bisect and Guix's own package solver.

Yes, we could have a variant of the solver that's more relaxed. It could output multiple solutions so the user can inspect them and pick the best one.

It might also be productive to add infrastructure to help scientists more conveniently track and study their recent packaging experiments. Guix will only become more useful the more packages which are already available. Work which makes packaging more approachable by more people benefits everyone. Perhaps you can think of other ideas in this direction?

I'm not sure how "packaging experiments" are different from packaging software the usual way. I think making the importers easier to use and debug would help, although that sounds outside the scope of the projects.

Finally, would these projects be considered large or medium for the purposes of GSOC?


On March 30, 2023 7:22:14 PM EDT, Spencer Skylar Chan <> wrote:
Hi Kyle,

On 3/24/23 14:59, Kyle wrote:
I am a bit worried about your proposed project is too focused on replacing python with guile. I think the project would benefit more from making python users more comfortable productively using Guix tools in concert with the tools they are already comfortable with.

Yes, I agree with you. Replacing Python with Guile is a much more ambitious task and is not the highest priority here.

I'm wondering if you might consider modifying your project goals toward exploring how GWL might be enhanced so that it could better complement more expressive language specific workflow tools like snakemake. I am also personally interested in exploring such a facilities from the targets workflow system in R as well. Alternatively, perhaps you could focus kn extending the GWL with more features?

I would also be interested in extending GWL with more features, I will follow up with this on the GWL mailing list.

I agree that establishing an achievable scope within a short timeline is crucial. The conda env importer idea would be quite an ambitious undertaking by itself and would lead you towards thinking about some pretty interesting and impactful problems.

While it's a challenging project, it could be broken into smaller steps:

1. import packages by exact matching names only, without versioning.
2. extend `guix import` to have `guix import conda` to help with package names that do not match exactly, and to accelerate adoption of Conda packages not in Guix
3. match software version numbers when translating Conda packages to Guix

What's currently undefined is the error handling:
- if a Conda package does not exist in Guix
- if the dependency graph is not solvable
- if compiling the environment fails (due to mismatching dependency versions)

I believe there are many satisfactory stopping points for successful completion within the timeline of the summer, which I hope to present with my proposal soon.


On March 22, 2023 5:44:52 PM EDT, Spencer Skylar Chan <> wrote:

Hi Ricardo,

On 3/22/23 14:19, Ricardo Wurmus wrote:

- Translating Snakemake to Guix Workflow Language (GWL)

Ricardo, maybe you would have some suggestions. :-)

Oh, this looks interesting. Could you please elaborate on the idea?

My idea is to take as input a Snakemake workflow file and eventually output an equivalent GWL workflow file.

Currently, Snakemake workflows can be exported to CWL (Common Workflow Language): <>

One approach could be to add CWL import/export capabilities to GWL. Then Snakemake/GWL conversion would be a 2 step process, using CWL as an intermediate step:

1. Snakemake -> CWL
2. CWL -> GWL

However, CWL is not as expressive as Snakemake. There may be some details that are lost from Snakemake workflows.

So a 1-step Snakemake/GWL transpiler could be interesting, as both Snakemake/GWL use a domain-specific language inside a general purpose language (Python/Guile respectively). There may be a possibility to achieve more "accurate" translations between workflows.

Is this topic something that could fit into a summer project?

reply via email to

[Prev in Thread] Current Thread [Next in Thread]