Narinfo negative and transient error caching

From: Christopher Baines
Subject: Narinfo negative and transient error caching
Date: Fri, 05 Mar 2021 22:27:09 +0000
This has been on my mind for a while, as I wonder what effect it has on
users fetching substitues.

The narinfo caching as I understand it works as follows:

 Default success TTL => 36 hours
 Negative TTL        => 1 hour
 Transient error TTL => 10 minutes

I'm ignoring the success TTL, I'm just interested in the negative and
transient error values. Negative means that when a server says it
doesn't have an output, that response will be cached for an
hour. Transient errors are for other HTTP response codes, like 504.

I had a look through the Git history, caching negative lookups has been
a thing for a while. Caching transient errors was added, but I couldn't
see why.

Personally I don't see a reason to keep either behaviours?

In an extreme case, the Guix Build Coordinator has to work hard to work
around this caching. Asking the guix-daemon if a substitute exists is
dangerous, as it literally costs an hour if that substitute isn't
available yet, but will be shortly (which happens all the time when
building a bunch of things). Currently it checks itself, and only
continues to ask the guix-daemon to fetch the item if it knows it to
exist. The transient error caching is also problematic, as that imposes
a 10 minute penalty if there's a server issue.

Any thoughts?



