guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: guix gc takes long in "deleting unused links ..."


From: Caleb Ristvedt
Subject: Re: guix gc takes long in "deleting unused links ..."
Date: Fri, 01 Feb 2019 16:22:21 -0600
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux)

Björn Höfling <address@hidden> writes:

> Why does that take soo long?

Warning: technical overview follows.

It takes so long because after the garbage collection pass it then does
a *full* pass over the /gnu/store/.links directory. Which is huge. It
contains an entry for every unique file (not just store entry, but
everything in those entries, recursively) in the store. The individual
work for each entry is low - just a readdir(), lstat() to see if the
link is still in use anywhere, and an unlink() if it isn't. But my
store, for example, has 998536 entries in there. I got that number with
a combination of ls and wc, and it took probably around 4 minutes to get
it.

Ideally, the reference-counting approach to removing files would work
the same as in programming languages: as soon as a reference is removed,
check whether the reference count is now 0 (in our case 1, since an
entry would still exist in .links). In our case, we'd actually have to
check prior to losing the reference whether the count *would become* 1,
that is, whether it is currently 2. But unlike in programming languages,
we can't just "free a file" (more specifically, an inode). We have to
delete the last existing reference, in .links. The only way to find that
is by hashing the file prior to deleting it, which could be quite
expensive, but for any garbage collection targeting a small subset of
store items it would likely still be much faster. A potential fix there
would be to augment the store database with a table mapping store paths
to hashes (hashes already get computed when store items are
registered). Or we could switch between the full-pass and incremental
approaches based on characteristics of the request.

> Or better: Is it save here to just hit CTRL-C (and let the daemon work
> in background, or whatever)?

I expect that CTRL-C at that point would cause the guix process to
terminate, closing its connection to the daemon. I don't believe the
daemon uses asynchronous I/O, so it wouldn't be affected until it tried
reading or writing from/to that socket. So yeah, if you do that at that
point it would probably work, but you may as well just start it in the
background in that case ("guix gc ... &") or put it in the background
with CTRL-Z followed by the 'bg' command.

- reepca



reply via email to

[Prev in Thread] Current Thread [Next in Thread]