Re: A Critique of Shepherd Design

guix-devel
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: A Critique of Shepherd Design

From:	raid5atemyhomework
Subject:	Re: A Critique of Shepherd Design
Date:	Sat, 20 Mar 2021 02:42:38 +0000
Hello Maxime,

> Multi-threading seems complicated (but can be worth it). What work would you 
> put
> on which thread (please give us some concrete to reason about, ‘thread X does 
> A,
> and is created on $EVENT ...’)? A complication is that "fork" is ‘unsafe’ in a
> process with multiple threads (I don't know the details). Maybe that could be
> worked around. (E.g. by using system* on a helper binary that closes unneeded
> file descriptors and sets the umask .... and eventually uses exec to run the
> service binary.)

Perhaps the point is not so much to be *multi-threaded* but to be *concurrent*, 
and whether to be multi-*thread* or multi-*process* or multi-*fiber* is a 
lower-level detail.

For example, we can consider that for 99% of shepherd services, the `start` 
action returns a numeric PID, and the remaining ones return `#t`.  Rather than 
call the `start` action in the same thread/process/fiber that implements the 
Shepherd main loop:

* Create a pipe.
* `fork`.
  * On the child:
    * Close the pipe output end, make the input end cloexec.
    * Call the `start` action.
    * If an error is thrown, print out `(error <>)` with the serialization of 
the error to the pipe.
    * Otherwise, print out `(return <>)` with the return value of the `start` 
action.
  * On the parent:
    * Close the pipe input end, and add the output end and the pid of the 
forked child to one of the things we will poll in the mainloop.
    * Return to the mainloop.
* In the mainloop:
  * `poll` (or `select`, or whatever) the action pipes in addition to the 
`herd` command socket(s).
  * If an action pipe is ready, read it in:
    * If it EOFed then the sub-process died or execed without returning 
anything, treat as an error.
    * Similarly if unable to `read` from it.
    * Otherwise deserialize it and react appropriately as to whether it is an 
error or a return of some value.
  * If an action pipe has been around too long, kill the pid (and all child 
pids) and close the pipe and treat as an error.

I mean --- single-threaded/process/fibered, non-concurrent `init` systems have 
fallen by the wayside because of their brittleness.  SystemD is getting wide 
support precisely because it's an excellent concurrent `init` mechanism that is 
fairly tolerant of mistakes; a problem in one service unit is generally 
isolated in that service unit and its dependents and will not cause problems 
with the other service units.

Presumably, `fork`ing within a `fork`ed process has no issues since that is 
what is normally done in Unix anyway, it's threads that are weird.


> That's a bit of an over-simplification. At least in a singly-threaded 
> Shepherd,
> waitpid could not be used by an individual service, as it
> (1) returns not often
> enough (if a process of another service exits and this action was waiting on
> a single pid) (2) too often (likewise,
> and this action was ‘just waiting’
> (3) potentially never (imagine every process has been started and an action
> was waiting for one to exit,
> and now the user tries to reboot. Then shepherd
> daemon would never accept the connection from the user!).

Yes, I agree it is an oversimplification, but in any case, there should be some 
smaller subset of things that can be (typically) done.

Then any escape hatch like `unsafe-turing-complete` or 
`without-static-analysis` can retain the full power of the language for those 
times when you *really* do need it.

>
> And in a multi-threaded Shepherd, "fork" is unsafe. (Anyway, "fork" captures
> too much state, and no action should ever call "exec" without "fork".) Also,
> "exec" replaces too little state (e.g. open file descriptors, umask, root
> directory, current working directory ...).
>
> But perhaps that's just a call for a different set of primitives (which could
> be implemented on top of these system calls by shepherd, or directly on top
> of the relevant Hurd RPCs when running on GNU/Hurd.).

Yes.

> > Yet the language is a full Turing-complete language, including the major
> > weakness of Turing-completeness: the inability to solve the halting problem.
>
> IIUC, there are plans to perform the network configuration inside shepherd
> (search for guile-netlink). So arbitrary code still needs to be allowed.
> This is the Procedural (extensible) I'll referring to later.
>
> > The fact that the halting problem is unsolved in the language means it is
> > possible to trivially write an infinite loop in the language. In the context
> > of an `init` system, the possibility of an infinite loop is dangerous, as it
> > means the system may never complete bootup.
>
> I'm assuming the hypothetical infinite loop is in, say, the code for starting
> guix-daemon which isn't very essential (try "herd stop guix-daemon"!)
> (If there's an infinite loop is in e.g. udev-service-type then there's indeed
> a problem, then that infinite loop might as well be in eudev itself, so I'm
> only speaking of non-essential services blocking the boot process.)
>
> I'm not convinced we need Turing-incompleteness here, at least if the
> start / do-a-dance action / stop code of each service definition is run in
> a separate thread (not currently the case).
>
> > I have experienced this firsthand since I wrote a Shepherd service to launch
> > a daemon, and I needed to wait for the daemon initialization to complete.
> > My implementation of this had a bug that caused an infinite loop to be 
> > entered,
> > but would only tickle this bug when a very particular (persistent on-disk) 
> > state
> > existed.
> > I wrote this code about a month or so ago, and never got to test it until 
> > last week,
> > when the bug was tickled. Unfortunately, by then I had removed older system 
> > versions
> > that predated the bug. When I looked at a backup copy of the 
> > `configuration.scm`, I
> > discovered the bug soon afterwards.
> > But the amount of code verbiage needed here had apparently overwhelmed me 
> > at the time I
> > wrote the code to do the waiting, and the bug got into the code and broke 
> > my system.
> > I had to completely reinstall Guix (fortunately the backup copy of 
> > `configuration.scm`
> > was helpful in recovering most of the system, also ZFS rocks).
>
> Ok, shepherd should be more robust (at least for non-essential services like 
> guix-publish).
>
> > Yes, I made a mistake. I'm only human. It should be easy to recover from 
> > mistakes.
>
> I would need to know what the mistake and what ‘recovering’ would be exactly 
> in this
> case, before agreeing or disagreeing whether this is something shepherd 
> should help
> with.

It's what I described in the previous paragraph.  I wrote code to wait for 
startup, the code to wait for startup had a bug that caused it to enter an 
infinite loop under a certain edge case.

The edge case was not reached for a long time (almost a month), so I only found 
it a week ago when it triggered and broke my system.  In order to recover I had 
to reinstall Guix System completely, and none of the saved system generations 
had a configuration that predated the introduction of the bug.

If you want specifics --- as a sketch, what I did was something similar to this:

    (let wait-for-startup ((count 0))
      (cond
        ((daemon-started?) ; if the daemon died this would return #f
         #t)
        ((> count 30)
         (format #t "daemon taking too long, continuing with boot~%")
         #f))
        (else
         (sleep 1)
         (wait-for-startup (- count 1)))) ;;; <--- (fuck)

The daemon died (this was the persistent on-disk edge case, arguably a bug in 
`daemon-started?` as well) causing `(daemon-started?)` to return `#f` 
permanently, and then I started counting *in the wrong direction*.


> > The full Turing-completeness of the language invites mistakes,
>
> Agreed (though ‘invites’ seems a little too strong here).

Come on.  Something as simple as "Wait for some condition to be true" needs 
code like the above to implement fully in the Shepherd action-description 
language (i.e. Scheme).  And because I was debating on "just put `(count 30)` 
in the `let` and decrement" versus "Naaa just count up like a normal person" 
and got my wires crossed, I got into the above completely dumb mistake.

Indeed, not only should `-` be changed to `+` in the above, but it's also 
better to do something like this as well:

    (let wait-for-startup ((count 0))
      (cond
        ((daemon-started?) ; if the daemon died this would return #f
         #t)
        ((not (zero? (car (waitpid daemon-pid WNOHANG))))
         (format #t "daemon died!~%")
         #f)
        ((> count 30)
         (format #t "daemon taking too long, continuing with boot~%")
         #f))
        (else
         (sleep 1)
         (wait-for-startup (+ count 1))))

Because if the daemon died, waiting for 30 seconds would be fairly dumb as well.

That is a fair bit of verbiage, and much harder to read through for bugs.

As per information theory: the more possible things I can talk about, the more 
bits I have to push through a communication channel to convey it.  
Turing-complete languages can do more things than more restricted  
domain-specific languages, so on average it requires more code to implement an 
arbitrary thing in the Turing-complete language (thus requiring more careful 
review) than the restricted domain-specific language.

> -   Section 3: A nicely behaving API.
>
> > So what can we do?
> > For one, a Turing-complete language is a strict superset of 
> > non-Turing-complete
> > languages. So one thing we can do is to define a more dedicated language 
> > for Shepherd
> > actions, strongly encourage the use of that sub-language, and, at some 
> > point, require
> > that truly Turing-complete actions need to be wrapped in a 
> > `(unsafe-turing-complete ...)` form.
> > For example, in addition to the current existing 
> > `make-forkexec-constructor` and
> > `make-kill-destructor`, let us also add these syntax forms:
>
> > `(wait-for-condition <timeout> <form>)` - Return a procedure that accepts a 
> > numeric `pid`, that does: Check if evaluating `<form>` in the lexical 
> > context results in `#f`. If so, wait one second and re-evaluate, but exit 
> > anyway and return `#f` if `<timeout>` seconds has passed, or if the `pid` 
> > has died. If `<form>` evaluates to non-`#f` then return it immediately.
> > `(timed-action <timeout> <form> ...)` - Return a procedure that accepts a 
> > numeric `pid`, that does: Spawn a new thread (or maybe better a `fork`ed 
> > process group?). The new thread evaluates `<form> ...` in sequence. If the 
> > thread completes or throws in `<timeout>` seconds, return the result or 
> > throw the exception in the main thread. IF the `<timeout>` is reached or the
> > given `pid` has died, kill the thread and any processes it may have 
> > spawned, then return `#f`.
> > `(wait-to-die <timeout>)` - Return a procedure that accepts a `pid` that 
> > does: Check if the `pid` has died, if so, return #t. Else sleep 1 second 
> > and try again, and if `<timeout>` is reached, return `#f`.
>
> Hard to say if these syntaxes are useful in advance.

In my particular case I could have just written `(wait-for-condition 30 
(daemon-started?))`.  I imagine a fair number of wait-for-daemon-startup things 
can use that or `(timed-action 30 (invoke #$waiter-program))`.

>
> In any case, I don't like numeric pids, as they are reused.
> Could we use something like pidfds (Linux) or whatever the Hurd has instead?
> (Wrapped in nice type in Scheme, maybe named <task> with a predicate task?,
> a predicate task-dead?, a wait-for-task-dead operation (if we use 
> guile-fibers).)

Certainly, but that may require breaking changes to existing specific shepherd 
actions.


> Note: (perform-operation (choice-operation (wait-for-task-dead task) (sleep 
> 5)))
> == wait until task is dead OR 5 seconds have passed. Operations in 
> guile-fibers
> combine nicely!
>
> Also, sprinkling time-outs everywhere seems rather dubious to me (if that's
> what timed-action is for). If some process is taking long to start,
> then by killing it it will take forever to start.

Sprinkling timeouts makes the halting problem trivial: for any code that is 
wrapped in a timeout, that code will halt (either by itself, or by hitting the 
timeout).

Indeed, you can make a practical total functional language simply by requiring 
that recursions have a decrementing counter that will abort computation when it 
reaches 0.  It could use mostly partial code patterns with the addition of that 
decrementing counter, and it becomes total.  In many ways the programmer is 
just assuring the compiler "this operation will not take more than N 
recursions".


>
> > The above forms should also report as well what they are doing (e.g. `Have 
> > been waiting 10 of 30 seconds for <form>`) on the console and/or syslog.
>
> Something like that would be useful, yes. But printing <form> itself seems a 
> bit
> low-level to me.

Yes, but if it's something that the end-user wrote in their 
`configuration.scm`, it seems fine to me.

>
> > In addition, `make-forkexec-constructor` should accept a `#:post-fork`, 
> > which is
> > a procedure that accepts a `pid`, and `make-kill-destructor` should accept a
> > `#:pre-kill`, which is a procedure that accepts a `pid`. Possibly we need to
> > add combinators for the above actions as well. For example a `sub-actions`
> > procedural form that accepts any number of functions of the above `pid -> 
> > bool`
> > type and creates a single combined `pid -> bool`.
>
> > So for example, in my case, I would have written a 
> > `make-forkexec-constructor`
> > that accepted a `#:post-fork` that had a `wait-for-condition` on the code 
> > that
> > hecked if the daemon initialization completed correctly.
> > I think it is a common enough pattern that:
> >
> > -   We have to spawn a daemon.
> > -   We have to wait for the daemon to be properly initialized 
> > (`#:post-fork`)
> > -   When shutting down the daemon, it's best to at least try to politely 
> > ask it
> >     to finish using whatever means the daemon has (`#:pre-kill`).
> >
> > -   If the daemon doesn't exit soon after we politely asked it to, be less 
> > polite and start sending signals.
> >
> > So I think the above should cover a good number of necessities.
>
> Agreed. My own proposal, which I like better (-: :
> (this is the ‘procedural’ (extensible) interface)
>
> -   Multi-threading using guile-fibers (we'll use condition variables & 
> channels).
> -   Instead of make-forkexec-constructor, we have
>     make-process-constructor, as fork + exec as-is is troublesome in
>     a multi-threaded application (see POSIX).
>
>     (It still accepts #:user, #:group & the like)
>
>     make-process-constructor returns a thunk as usual,
>     but this thunk THUNK works differently:
>
>     THUNK : () -> <process-task>.
>
>
> <process-task> is a GOOPS class representing processes (parent: 
> <shepherd-task>),
> that do not neccessarily exist yet / anymore. <process-task>
> has some fields:
>
> * stopped?-cond
> (condition variable, only signalled after exit-value is set)
> * exit-value
> (an atomic box, value #f or an integer,
> only read after stopped?-cond is signalled
> or whenever the user is feeling impatient and asking shepherd
> how it is going.)
> * stop!-cond (condition variable)
> * stop-forcefully!-cond (condition variable)
> * started-cond (condition variable, only signalled when ‘pid’ has been set)
> * pid (an atomic box)
> * maybe some other condition variables?
> * other atomic boxes?
>
> THUNK also starts a fiber associated with the returned <process-task>,
> which tries to start the service binary in the background (could also be used
> for other things than the service binary). When the binary has been started,
> the fiber signals started-cond.
>
> The fiber will wait for the process to exit, and sets exit-value and
> signals stopped?-cond when it does. It will also wait for a polite request
> to stop the service binary (stop!-cond) and impolite requests (for SIGKILL?)
> (stop-forcefully!-cond).
>
> (Does shepherd even have an equivalent to stop-forcefully! ?)
>
> kill-destructor is adjusted to not work with pids directly, but rather
> with <process-task>. It signals stop!-cond or stop-forcefully!-cond.
>
> (So process creation & destruction happens asynchronuously, but there
> still is an easy way to wait until they are actually started.)
>
> -   #:pre-kill & #:post-fork seem rather ad-hoc to me. I advise
>     still wrapping make-process-constructor & kill-destructor.
>     However, the thunk still needs to return a <shepherd-task> (or subclass
>     of course).
>
>     If the thunk does anything that could block / loop (e.g. if it's a HTTP
>     server, then the wrapper might want to wait until the process is actually
>     listening at port 80), then this blocking / looping must be done in a 
> fiber.
>
>     Likewise if the killer does anything that could block / loop (e.g.
>     first asking the process to exit nicely before killing it after a 
> time-out).

Sure, this design could work better as well.  But *some* restricted dialect 
should exist, and we should discourage people from using the full Guile 
language in Shepherd actions they write bespoke in their `configuration.scm`, 
not tout it as a feature.

> -   Section 4: Static analysis
>
> > Then, we can define a strict subset of Guile, containing a set of forms we 
> > know are
> > total (i.e. we have a proof they will always halt for all inputs). Then any 
> > Shepherd
> > actions, being Lisp code and therefore homoiconic, can be analyzed. Every 
> > list must
> > have a `car` that is a symbol naming a syntax or procedure that is known to 
> > be safe
> > --- `list`, `append`, `cons*`, `cons`, `string-append`, `string-join`, 
> > `quote`
> > (which also prevents analysis of sub-lists), `and`, `or`, `begin`, thunk 
> > combinators,
> > as well as the domain-specific `make-forkexec-constructor`, 
> > `make-kill-destructor`,
> > `wait-for-condition`, `timed-action`, and probably some of the `(guix build 
> > utils)`
> > like `invoke`, `invoke/quiet`, [...]
>
> `invoke' has no place in your subset, as the invoked program can take 
> arbitrarily long.
> (Shouldn't be a problem when multi-threading, but then there's less need for 
> this kind
> of static analysis.)

Well, the thing is --- I can probably write a shell script and test it 
**outside of Shepherd**.  In fact, in my mistake above, on a previous non-Guix 
system, I ***did*** use a shell script to wait for daemon startup.  But when I 
ported it over to Guix, I decided to rewrite it as Scheme code to "really dive 
into the Guix system".

Before, when writing it as a shell script I was able to test it and indeed 
found problems and fixed them.  But as Scheme code, well, it's a bit harder, 
since I have to go import a bunch of things (that I have to go search for, and 
I have to figure out Guile path settings for the bits that are provided by 
Guix, etc.), and then afterwards I have to copy back the tested code into my 
`configuration.scm`.  This was enough of an annoyance to me that it discouraged 
me from testing it, which could have caught the above bug.

Since `invoke` calls an external program, which runs isolated in its own 
process (and can be `kill`ed in order to unstuck Shepherd), and can more easily 
be independently tested outside the `configuration.scm`, it's substantially 
safer than writing the same logic directly in the Shepherd language, so it gets 
a pass.

Many other daemons that have some kind of "wait-for-daemon-startup" program 
will often include tests for the "wait" program as well in its test suite, so 
using `invoke` is substantially safer than writing bespoke code in Scheme that 
performs the same action.

> > `mkdir-p` etc.
>
> I have vague plans to replace 'mkdir-p', 'mkdir-p/perms' etc. with some 
> procedure
> (prepare-directory
> `("/var/stuff #:bits #o700 #:user ... #:group
> ("blabla" #:bits ...
> #:atomic-generate-contents ,(lambda (port) (display 'stuff port))))
> ...))
>
> ... automatically taking care of symlinks & detecting whether there are 
> conflicts
> with different services.
>
> > Sub-forms (or the entire form for an action) can be wrapped in 
> > `(unsafe-turing-complete ...)`
> > to skip this analysis for the sub-form,
>
> An escape hatch seems good to me, though I would rather call it
> `(without-static-analysis (totality) ...)'. The forms are not neccesarily 
> ‘unsafe’,
> only potentially so. We could also define a checkers for other issues this 
> way.
>
> (I'm thinking of potential TOCTTOU (time of check to time of use) problems 
> involving
> symbolic links.)
>
> > but otherwise, by default, the specific subset must be used, and users have 
> > to
> > explicitly put `(unsafe-turing-complete ...)` so they are aware that they 
> > can
> > seriously break their system if they make a mistake here. Ideally, as much 
> > of
> > the code for any Shepherd action should be outside an 
> > ``unsafe-turing-complete`, and only parts of the code that really need the 
> > full Guile language to implement should be rapped 
> > in`unsafe-turing-complete`.
>
> I'm getting Rusty vibes here. Sounds sound to me.
>
> > (`lambda` and `let-syntax` and `let`, because they can rebind the meanings 
> > of symbols,
> > would need to be in `unsafe-turing-complete` --- otherwise the analysis 
> > routine would
> > require a full syntax expander as well)
>
> No. The analysis routine should not directly work on Scheme code, but rather 
> on
> the 'tree-il' (or maybe on the CPS code, I dunno).
> Try (macro-expand '(let ((a 0)) a)) from a Guile REPL.
>
> > Turing-completeness is a weakness, not a strength, and restricting 
> > languages to be
> > the Least Powerful necessary is better. The `unsafe-turing-complete` form 
> > allows
> > an escape, but by default Shepherd code should be in the restricted 
> > non-Turing-complete
> > subset, to reduce the scope for catastrophic mistakes.
>
> I'm assuming make-fork+exec-constructor/container would be defined in Scheme
> and be whitelisted in raid5atemyhomework-scheme?

Yes.

>
> -   Section 5 -- Declarative
>
>     Something you don't seem to have considered: defining services in a 
> declarative
>     language! Hypothetical example à la SystemD, written in something vaguely
>     resembling Wisp (Wisp is an alternative read syntax for Scheme with less
>     parentheses):

I prefer SRFI 110 myself.

> (I'm a bit rusty with the details on defining shepherd services
>     in Guix:
>
>     shepherd-service
>     start `I-need-to-think-of-a-name-constructor #:binary #$(file-append 
> package "/bin/stuff") #:arguments stuff #:umask suff #:user stuff #:group 
> stuff ,@(if linux`(#:allowed-syscalls ',(x y z ...)))
>     #:friendly-shutdown
>     thunk
>     invoke #$(file-append package "/bin/stop-stuff-please")
>     #:polling-test-started?
>     thunk
>     if . file-exists? "/run/i-am-really-started"
>     #true
>
>     ;; tell I-need-to-think-of.... to wait 4 secs
>     ;; before trying again
>     seconds 4
>     ;; how much time starting may take
>     ;; until we consider things failed
>     ;; & stop the process
>     #:startup-timeout?
>     thunk
>     if spinning-disk?
>     plenty
>     little
>     #:actions etcetera
>     ...
>
>     Well, that's technically still procedural, but this starts to look
>     like a SystemD configuration file due to the many keyword arguments!
>
>     (Here (thunk X ...) == (lambda () X ...), and
>     I-need-to-think-of-a-name-constructor
>     is a procedure with very many options, that
>     ‘should be enough for everyone. If it doesn't support
>     enough arguments,
>     look in SystemD for inspiration.)
>
>     That seems easier to analyse. Although it's bit kitchen-sinky,
>     no kitchen is complete without a sink, so maybe that's ok ...
>     (I assume the kitchen sink I-need-to-think-of-a-name-constructor
>     would be defined in the guix source code.)


Well, yes ---- but then we can start arguing that using SystemD would be 
better, as it is more battle-tested and already exists and all the kitchen-sink 
features are already there and a lot more people can hack its language than can 
hack the specific dialect of scheme (augmented with Guixy things like 
`make-forkexec-constructor`) used by GNU Shepherd.

Thanks
raid5atemyhomework
[Prev in Thread]
Current Thread
[Next in Thread]
A Critique of Shepherd Design, raid5atemyhomework, 2021/03/19
- Re: A Critique of Shepherd Design, Maxime Devos, 2021/03/19
  - Re: A Critique of Shepherd Design, raid5atemyhomework <=
- Re: A Critique of Shepherd Design, Maxim Cournoyer, 2021/03/20
- Re: A Critique of Shepherd Design, Mark H Weaver, 2021/03/20
  - Re: A Critique of Shepherd Design, raid5atemyhomework, 2021/03/20
- Re: A Critique of Shepherd Design, Ludovic Courtès, 2021/03/20
  - Re: A Critique of Shepherd Design, raid5atemyhomework, 2021/03/20
    - Re: A Critique of Shepherd Design, Ludovic Courtès, 2021/03/22
    - Re: A Critique of Shepherd Design, raid5atemyhomework, 2021/03/24
    - Re: A Critique of Shepherd Design, raid5atemyhomework, 2021/03/24
- Re: A Critique of Shepherd Design, raingloom, 2021/03/22
  - Re: A Critique of Shepherd Design, Martin, 2021/03/22
Prev by Date: Re: gnu: imagemagick/fixed: Redirect old sonames to new sonames.
Next by Date: Re: A Critique of Shepherd Design
Previous by thread: Re: A Critique of Shepherd Design
Next by thread: Re: A Critique of Shepherd Design
Index(es):
- Date
- Thread