[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Guix Docker image inflation

From: Chris Marusich
Subject: Re: Guix Docker image inflation
Date: Sun, 31 May 2020 14:04:19 -0700
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)

Stephen Scheck <> writes:

> IF any of the store files resulting from `guix pull` are ephemeral
> (i.e. intermediate build results not anchored to a profile) AND guix
> GC worked inside the container, my approach might still work - yes
> there would be image and layers growth but it might be small enough
> not to care between periodic image rebases. But I'm starting to doubt
> that, or at least it is difficult to quantify with the GC issues.

I think you're right about it being difficult to quantify the GC issues.

Basically, when you run "guix pull", the current Guix will "build"
(i.e., maybe download via substitutes, maybe build from source) the new
Guix, which puts it into the store, and updates the profile symlinks to
make it current.  In the process of doing this, some intermediate builds
might be performed if substitutes are not available.  Although the new
Guix will remain live in the store after the profile symlinks are
updated to make it current, (1) intermediate results might be left dead
after "guix pull" is finished, and (2) if the old Guix is sufficiently
different from the new Guix, it will also become dead after the symlinks
that were keeping it live are removed.

So, the amount of garbage that will be left over depends on a few
factors, like whether substitutes were available, and how different the
new Guix is from the old one.  It can also depend on how the guix-daemon
has been started (see "--gc-keep-outputs" and --gc-keep-derivations" in
the "Invoking guix-daemon" section of the manual).

In the case of your Docker images, most (all?) of the garbage is coming
from case (2) above: as Guix changes, the old Guix will be made dead and
GC'd (hypothetically, let's suppose GC is working), but it will still
exist on prior layers, since it came from a prior layer.  As for case
(1), the intermediate results, I think they are not contributing to your
image size for two reasons: substitutes are probably available, and even
if they weren't available, the intermediates would probably appear
during "guix pull", which means they'd be on the top layer and would be
GC'd, so they wouldn't be included in any layer of the next image.  The
fact that the biggest dead paths in your latest image consist entirely
of store paths that look suspiciously like they came from prior Guix
installations is further evidence in support of this theory.

--8<---------------cut here---------------start------------->8---
root@guix /# du -Phc $(guix gc --list-dead) 2>/dev/null | sort -hk 1,1 | tail
finding garbage collector roots...
determining live/dead paths...
187M    /gnu/store/47aack48aczpzm635axsy4jf2pvmwrv0-guix-ef1d475b0-modules/lib
194M    /gnu/store/hz2rn2l0jixg91q4rsdcwc489y71ll29-guix-05e1edf22-modules
198M    /gnu/store/5mhn1ynxvy7jihsknsnv3yspkkvc0r5s-guix-2e59ae238-modules
210M    /gnu/store/0vwg9aqzs5xrk10vcs4dl105s3f42ilf-guix-b1affd477-modules
210M    /gnu/store/47aack48aczpzm635axsy4jf2pvmwrv0-guix-ef1d475b0-modules
3.0G    total
root@guix /# 
--8<---------------cut here---------------end--------------->8---

These "guix-HASH-modules" directories, for example, are used as part of
each Guix installation:

--8<---------------cut here---------------start------------->8---
root@guix /# realpath ~/.config/guix/current/share/guile
root@guix /# 
--8<---------------cut here---------------end--------------->8---

Each of them has a total closure size of almost 500 MB, although since
they might share some references, each one individually is adding "only"
about 200 MB.

--8<---------------cut here---------------start------------->8---
root@guix /# guix size 
store item                                                       total    self
/gnu/store/mj6pf6nf0kf03nhh7bmpc6m43v6knq6m-guix-a5374cde9-modules   485.9   
206.9  42.6%
/gnu/store/hkmsljl2sf4nk96b35f0bmfkr2lqanfq-guix-packages-base     105.7   
105.7  21.8%
/gnu/store/s7izb7j0s5rzcq297nd7ba9sfiqh5zmz-guix-system             43.2    
43.2   8.9%
/gnu/store/fa6wj5bxkj5ll1d7292a70knmyl7a0cr-glibc-2.31              38.4    
36.7   7.6%
/gnu/store/01b4w3m6mp55y531kyi1g8shh722kwqm-gcc-7.5.0-lib           71.0    
32.6   6.7%
/gnu/store/wcv5mscivggkygnz68nn2671fr3kapjc-guix-packages-base-source    19.4   
 19.4   4.0%
/gnu/store/6zygksmvzcq92xf65cna91dbf7a4zblh-guix-extra              19.4    
19.4   4.0%
/gnu/store/a7wiy24mmcilbqp39pl0jdlw10vbvavb-guix-cli                 8.0     
7.3   1.5%
/gnu/store/f6k9b4grrfpip4h5lrmpnsnn2gqziihr-guix-system-tests        4.6     
4.6   1.0%
/gnu/store/gbrd1laxsncb9zd218pyglisxyxymmbd-guix-system-source       1.9     
1.9   0.4%
/gnu/store/mmhimfwmmidf09jw1plw3aw1g1zn2nkh-bash-static-5.0.16       1.6     
1.6   0.3%
/gnu/store/5lr8miawrk380zw8yjy0crcl6vcs10s3-guix-extra-source        1.5     
1.5   0.3%
/gnu/store/pwcp239kjf7lnj5i4lkdzcfcxwcfyk72-bash-minimal-5.0.16     39.4     
1.0   0.2%
/gnu/store/r7k859hmcnkazf492fasqvk25jflnfk6-xz-5.2.4                73.0     
0.9   0.2%
/gnu/store/bhs4rj58v8j1narb2454raan2ps38xd8-grep-3.4                72.9     
0.8   0.2%
/gnu/store/z0572147hprpbjrcjqkgrv3f80ip2klx-guix-cli-source          0.7     
0.7   0.1%
/gnu/store/a9f7wmc75hbpg520phw9z4l9asm3qvsw-bzip2-1.0.8             72.5     
0.4   0.1%
/gnu/store/7y0nin2d0j46j26a1n46bl5zl3px0zvz-guix-system-tests-source     0.3    
 0.3   0.1%
/gnu/store/rykm237xkmq7rl1p0nwass01p090p88x-zlib-1.2.11             71.2     
0.2   0.0%
/gnu/store/jqr5bz89gfwhxcndnhq333dyclvkq7ws-lzlib-1.11              71.2     
0.2   0.0%
/gnu/store/378zjf2kgajcfd7mfr98jn5xyc5wa3qv-gzip-1.10               73.1     
0.2   0.0%
/gnu/store/kfj1lc84v50imn3raijgih4salilmf1a-guix-packages-base-modules   125.2  
   0.0   0.0%
/gnu/store/lvszhqs57scb2ax18l2nrn9dwiyf6iza-guix-system-tests-modules     4.9   
  0.0   0.0%
/gnu/store/lr65f259z1730p7bvplsj9k6yvbkyh39-guix-system-modules     45.1     
0.0   0.0%
/gnu/store/nk1x6cdif8pd9vi04nzxfqinh0ag06am-guix-extra-modules      20.9     
0.0   0.0%
/gnu/store/s6vlfscnfvnrlv3yfag6qsy5j6c9pxqb-guix-cli-modules         8.0     
0.0   0.0%
total: 485.9 MiB
root@guix /# 
--8<---------------cut here---------------end--------------->8---

And there are still other components adding space each time you run
"guix pull", like the "guix-system" component, for example:

--8<---------------cut here---------------start------------->8---
root@guix /# du -Phc $(guix gc --list-dead | grep guix-system) 2>/dev/null | 
sort -hk 1,1 | tail
finding garbage collector roots...
determining live/dead paths...
44M     /gnu/store/qhbk7g8z97m37iak1s1yn2my82gv0lj5-guix-system/gnu
44M     /gnu/store/slwkzcmg6r1lr9a16x3krd2ax384p8wr-guix-system
44M     /gnu/store/slwkzcmg6r1lr9a16x3krd2ax384p8wr-guix-system/gnu
44M     /gnu/store/vwzk618h1wxy6z9i06xnhnxj4gvhkiss-guix-system
44M     /gnu/store/vwzk618h1wxy6z9i06xnhnxj4gvhkiss-guix-system/gnu
44M     /gnu/store/w47fgv8p2hvaqdwywymwvm0qlh4gw0ih-guix-system
44M     /gnu/store/w47fgv8p2hvaqdwywymwvm0qlh4gw0ih-guix-system/gnu
44M     /gnu/store/zf67wb6c0s97vwmywjq09hy9jq0w5mmi-guix-system
44M     /gnu/store/zf67wb6c0s97vwmywjq09hy9jq0w5mmi-guix-system/gnu
523M    total
root@guix /# 
--8<---------------cut here---------------end--------------->8---

Anyway, the point is: you begin with a previous image.  The previous
image already has these store paths from the previous installation of
Guix.  Therefore, they exist on the previous layer.  Because they exist
on the previous layer, they cannot be removed from the Docker image, and
they are carried forward in that previous layer, to all new images.
Regardless of any changes to guix-daemon we might make, the way in which
you build your images will cause them to grow by hundreds of megabytes
every day.

> Actually, there might be another way around this, still avoiding the
> need for a custom Runner, for example mounting /var/guix and
> /gnu/store into the container instead of belonging to it. If done that
> way, layer accumulation wouldn't be an issue, and maybe GC between
> layers neither.

This sounds like a great idea, actually!  "The right way" to do Docker
containers is to have a single process per container, and to not store
state in the Docker container.  We're violating that principle on both
counts when we run an entire GNU/Linux distribution inside a Docker
container, especially since the guix-daemon is all about managing the
"state" of /var/guix and /gnu/store.  If you can somehow move that
"state" into a Docker volume instead of the container itself, that would
definitely be an improvement.  It may be tricky, though, since if
guix-daemon sees stuff in /gnu/store that is inconsistent with its
database in /var/guix, bad things can happen.  So you'll have to ensure
they remain consistent with one another.

>> Besides store items, I noticed two other things about your images:
>> - The contents of /var is growing slowly without bound, but it isn't
>>   nearly as bad as the contents of /gnu/store.  This is probably due to
>>   log files; consider pruning them.
> These are presumably OK to delete, without any special handling for Guix?

I think the answer is "probably", but I would stop guix-daemon first.
Other processes may be using /var, too, so I would stop them, also.

>> - Your script runs "docker commit" while guix-daemon (and other
>>   programs) are still running.  To ensure the guix-daemon's database (or
>>   other things) does not become corrupt, consider terminating all
>>   processes before committing the new image.
> `docker commit` pauses the container (unless you tell it not to) ...
> although I guess that could still cause problems if Guix store writes
> aren't implemented in an atomic way.

I'm not sure what "pause" means in the Docker documentation, but since I
can run "docker commit" while running a shell in the container, and the
shell doesn't get terminated, it clearly doesn't terminate the
processes.  It might be safe do just pause the container when
committing, but it's definitely safe if you gracefully shut down all
processes first.  This definitely ensures that things like databases are
left in known good states when committing the image.

What I'm saying is that, yeah sure, you can probably get away with not
gracefully shutting down the processes.  Similarly, you can often get
away with pulling the power cord out of your computer because a lot of
software and storage is pretty robust by default nowadays.  However, it
increases the risk of encountering a problem like data corruption, so
it's better to shut things down gracefully if you can.


Attachment: signature.asc
Description: PGP signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]