[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: A Critique of Shepherd Design
Re: A Critique of Shepherd Design
Wed, 24 Mar 2021 14:29:47 +0000
Sent with ProtonMail Secure Email.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Tuesday, March 23, 2021 1:02 AM, Ludovic Courtès <email@example.com> wrote:
> raid5atemyhomework firstname.lastname@example.org skribis:
> > I'm not sure you can afford to keep it simple.
> It has limitations but it does the job—just like many other init systems
> did the job before the advent of systemd.
> > Consider: https://issues.guix.gnu.org/47253
> > In that issue, the `networking` provision comes up potentially before the
> > network is, in fact, up. This means that other daemons that require
> > `networking` could potentially be started before the network connection is
> > up.
> The ‘networking’ service is just here to say that some network interface
> is up or will be up. It’s admittedly vague and weak, but it’s enough
> for most daemons; they just need to be able to bind and listen to some
> > One example of such a daemon is `transmission-daemon`. This daemon will
> > bind itself to port 9091 so you can control it. Unfortunately, if it gets
> > started while network is down, it will be unable to bind to 9091 (so you
> > can't control it) but still keep running. On my system that means that on
> > reboot I have to manually `sudo herd restart trannsmission-daemon`.
> Could you report a bug for this one? I don’t see why it’d fail to bind.
Let me see if I can still get to the old syslogs where `transmission-daemon`
claims it cannot bind, but it still keeps going anyway. I've pulled my
`transmission-daemon` directly into my `configuration.scm` so I can edit its
`requirement` to include a custom `networking-online` service that does
> > In another example, I have a custom daemon that I have set up to use
> > the Tor proxy over 127.0.0.1:9050. It requires both `networking` and
> > `tor`. When it starts after `networking` comes up but before the
> > actual network does, it dies because it can't access the proxy at
> > 127.0.0.1:9050 (apparently NetworkManager handles loopback as well).
> Loopback is handled by the ‘loopback’ shepherd service, which is
> provided via ‘%base-services’. Perhaps you just need to have your
> service depend on it?
> > Switching to a concurrent design for Shepherd --- any concurrent design ---
> > is probably best done sooner rather than later, because it risks strongly
> > affecting customized `configuration.scm`s like mine that have almost a half
> > dozen custom Shepherd daemons.
> I suspect the main issue here is undeclared dependencies of some of the
> Shepherd services you mention.
> I like the “sooner rather than later” bit, though: it sounds like you’re
> about to send patches or announce some sponsorship program?… :-)
Not particularly, but I *have* looked over Shepherd and here are some notes.
Maybe I'll even send patches, but the reaction to the ZFS patches makes me just
shrug; I'd need to devote more time than what I spent on ZFS, and the ZFS
patches aren't getting into Guix, so why bother. If I get annoyed enough I'll
just patch my own system and call it a day. My own system has `nm-online` and
I don't expect to not have networking, so the `nm-online` delay is unlikely to
be an issue, and I don't intend to mess with the `configuration.scm` anymore
because it's just too brittle, I'll just host VMs instead and use a SystemD
fully-free OS like Trisquel, I only need Guix for the ZFS (which Trisquel does
not have, for some reason).
It seems to me that a good design to use would be that each `<service>` should
have its own process. Then the big loop in `modules/shepherd.scm` will then be
"just" a completely event-based command dispatcher that forwards commands to
the correct per-`<service>` process.
Now, one feature Shepherd has is that it can be set with `--socket-file=-`,
which, if specified, causes GNU Shepherd to enable GNU readline and use the
readline library to read `-herd`-like (?) commands.
Unfortunately the `readline` interface is inherently blocking instead of
event-based. The C interface of GNU readline has an alternative interface that
is compatible with event-based (and I've used this in the past to create a toy
chat program that would display messages from other users while you were typing
your own) but it looks like this interface is not exposed. I checked
`readline-port` as well, but the code I could find online suggests that this
just uses the blocking `readline` interface, and would (?) be incompatible with
the Guile `select`. (side note: the `SIGCHLD` problem could probably be fixed
if Guile had `pselect` exposed, but apparently it's not exposed and I'm not
prepared to dedicate even more time fiddling with the lack of syscalls in
Guile. Maybe a signal-via-pipe technique would work as an alternative though
since that supposedly works on every UNIX-like --- but presumably the Shepherd
authors already knew that, so maybe there is good reason not to use it).
Since `readline` is blocking, one possibility would be to *fork off* the
process that does the stdin communication. Then it can continue to use the
blocking `readline`. It just needs to invoke `herd stop root`when it gets EOF
(note: need to check how commands are sent, for now it looks to me (not
verified, just impression) that commands are sent as plain text).
Since the goal is to make the mainloop into a very parallel dispatcher, we need
some way to somehow send commands in stdin-mode. We can take advantage of the
little-known fact that UNIX domain sockets can pass file descriptors across
processes, with the file descriptor number being remapped to the receiving
process via magic kernel stuff. So, we create a `socketpair` (NOTE: CHECK IF
GUILE HAS `socketpair`!!! Also review how the fd-passing gets done, maybe
Guile doesn't expose the needed syscalls either, sigh), then each time the
`readline`-process receives a command, it creates a new `socketpair`, sends
over one end to the mainloop, sends the command via the other end, then waits
for a response and prints it. This should make it very near to in experience
as the blocking Shepherd.
If the above pattern is workable, ***we can use the same pattern for
`--socket-file=/file/path`***. We ***always*** fork off *some* process to
handle `--socket-file`, whether `stdin`-mode or not. In
`--socket-file=/file/path` mode, the `socket-file` process binds the socket
file, `listen`s on it on a loop, and then just passes the opened socket over to
We also need this pattern as a general feature of the mainloop. An action on
one `<service>` can trigger actions on another service (in theory; my cursory
review of the Guix code suggests that current services only trigger actions on
themselves (`ganeti`, `guix-daemon`; but this is not a full review and there
may be other services in Guix that do other stuff)); note in particular that
`start` causes every `requirement` to be started anyway. So I think we need a
general mechanism for the mainloop to receive commands from remote processes;
we might as well use the same mechanism for both the Shepherd<->user
interaction and the Shepherd<->service interaction.
So for clarity of exposition, let me then list down the processes created by
* The `mainloop` process which handles the main massively-parallel event loop.
This is PID 1.
* The `socket-file` process which either gets commands from `stdin`, or via the
* Multiple per-`<service>` process.
Now, the mainloop has to parse the command in order to learn which
per-`<service>` process the command should get forwarded to. And as mentioned,
each per-`<service>` process also needs a command-sending socket to go to the
mainloop. So for each per-`<service>` process:
* The mainloop maintains a mainloop->service socket to send commands over.
* The mainloop maintains a service->mainloop socket it receives command
The mainloop process also special-cases the `root` service --- it handles
commands to those directly (BUT this probably needs a lot of fiddling with the
data structures involved --- `herd status root` can now occur while a `herd
start <whatever>` is still running, so we need status reporting for "being
started up" and "being stopped" as well --- `/` for "starting up", `\` for
Now, the `action` and other procedures need to be special-cased. We need a
global variable indicating:
* The current process is `root` ie the mainloop process.
* The current process is some `<service>`.
Every `action` is different depending on this variable (`%process-identity`?).
* IF the action is going to the same `<service>` (including `root`):
* Just tail-call into the action code.
* If the current process is `root` and a non-`root` action is being performed:
* Check if the per-`<service>` process has been started, and start if needed.
* Schedule the command to be sent via the event loop.
* Keep operating the mainloop until the command has completed.
* Use an event-loop stepper function (i.e. just calls `select` and
dispatches appropriately, then returns, so caller has to implement the loop).
* Initially set a mutable variable to `#f.
* Schedule the command with a callback that sets the above variable to `#t`.
* Call the event-loop stepper function until the mutable variable is true.
* This implements the current semantics where a `.conf` file running an
action will block until the action finishes, while still allowing commands to
be sent to the Shepherd daemon.
* If the current process is not `root` and the action to be performed is of a
* Create a socketpair to send the command over to the mainloop and send it
* Send the command to the mainloop (blocking).
* Wait for completion (blocking).
Each per-`<service>` process has a simple blocking loop: It waits for commands
from the mainloop process, executes those commands, then loops again.
In particular, it means that any `start` actions in the `.conf` file will block
(which is the expected behavior of legacy `.conf` files, but even so, the
Shepherd will be able to handle commands even while it is still loading `.conf`.
=== Concurrency is Like Regexps, Now You Have Two Problems ===
But take not that this means that it is possible to deadlock services in this
* There are two services `A` and `B`.
* `A` has an action `AtoB` which invokes an action `BtoA` on service `B`.
* The `B` `BtoA` action invokes an action `Anoop` on service `A`.
* In the above structure, because the `A` per-`<service>` process is
waiting on the `BtoA` action on service `B`, it cannot handle the `Anoop`
* In the current single-threaded Shepherd, such a thing is actually
possible and would not cause a deadlock if the `A` `Anoop` terminates normally.
HOWEVER, this is probably a very unusual setup! It may be tolerable to simply
require that a service that performs actions on another service should have an
acyclic relationship (it is Turing-complete --- consider the case where the `B`
`BtoA` action reinvokes the `A` `AtoB` with the same arguments causing it to
invoke `B` `BtoA` action again ad infinitum --- whereas an acyclic relationship
requirement would provably terminate; it may even be possible, by passing a
"stack" (i.e. list of service names that caused a particular action of a
particular service to be invoked) when passing commands across, with the `root`
service always passing an empty stack, and each `<service>`-to-`<service>`
action prepends the name of the calling service, so the mainloop can detect a
dynamic cycle and just fail the command without forwarding).
IF this restriction is too onerous, then it may be possible to use an event
loop in the per-`<service>` process as well, and use the same
wait-for-events-while-blocking logic as on the mainloop --- the code might even
be reusable. BUT I worry about this bit, as it could potentially mean that an
action is invoked in the dynamic context (including fluids, sigh) of another
on-going completely-unrelated action, which is BAAAAAD. This is fine for the
`root` service since the `root` service is Shepherd itself (? I think) and we
can ensure that the Shepherd code does not depend on dynamic context.
Another is that in the current single-threaded Shepherd, any service's action
can (re-)register a new `<service>`. This is problematic since a
per-`<service>` process will obviously not affect the mainloop process
Again this is probably a very unusual setup. While such a thing would be cute,
I think it would be tolerable to simply require that only the mainloop process
(which is what loads the `.conf` file) is the only one that is allowed to
=== Taking Advantage of Concurrency ===
Now that each `<service>` gets its own process, we can add a new
`force-destroy-service` action on `root`.
herd force-destroy-service root <some-service>
This forces the per-`<service>` process, and that of every dependent
`<service>`, as well as all processes in the process trees of tho affected
per-`<service>` processes, to be force-killed (IMPORTANT: check how to get the
process tree, also look into "process groups", maybe that would work better,
but does it exist on Hurd?).
This new command helps mitigate the issue where Shepherd `start` actions are
Turing-complete and potentially contain infinite-loop bugs. If this happens,
the sysad can `herd force-destroy-service root <some-service>` the misbehaving
service, reconfigure the system to correct the issue, and restart.
Another issue is that the current Guix startup is like this (paraphrased):
(for-each (lambda (service) (start service)) '(guix-daemon #;...))
Now consider this in the context of the `nm-online` example above. The reason
why Guix+Shepherd cannot just ***always*** use `nm-online` like SystemD does is
because `nm-online` can delay boot for an arbitrary number of seconds. And
since `start` is a known-blocking legacy interface, it should remain a
known-blocking interface (back compatibility is a bitch).
This means that for the Guix startup, we should expose a new Shepherd API:
(start-multiple '(guix-daemon #;...))
This basically does this:
* Set a mutable variable to the length of the input list.
* For each service, if the per-`<service>` process isn't started yet, start it.
* For each service, schedule to `start` the service; have the callback
decrement the above variable (maybe also set another variable to a list of
services that fail to start).
* In a loop:
* If the above mutable variable is zero, exit.
* Otherwise, call the mainloop stepper function, then loop again.
This allows multiple `start` to be executed at once (which could trigger a
`start` again of their requirements, but if a service is already started then
its `start` action should do nothing) in parallel. There may be a thundering
herd effect on the mainloop though because of hammering `start` on the
requirements. Hmm. Concurrency is hard.
Then Guix also has to be modified to use `start-multiple` instead.
This further step ensures as well that infloop problems in custom service
definitions do not delay startup --- startup is fully parallel (modulo
thundering herds of daemons).
=== Overall ===
It seems to me that with the above redesign, there would be very little code
left of the original Shepherd, making this more of a reimplementation than a
patch or even a fork.