Re: A Critique of Shepherd Design

guix-devel
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: A Critique of Shepherd Design

From:	Maxime Devos
Subject:	Re: A Critique of Shepherd Design
Date:	Fri, 19 Mar 2021 22:49:28 +0100
User-agent:	Evolution 3.34.2
On Fri, 2021-03-19 at 17:33 +0000, raid5atemyhomework wrote:
> GNU Shepherd is the `init` system used by GNU Guix.  It features:
> 
> * A rich full Scheme language to describe actions.
> * A simple core that is easy to maintain.
> 
> However, in this critique, I contend that these features are bugs.

A lot of unpack here!  Some agreements, some disagreements,
some alternative proposals, maybe some misunderstandings:

* Section 1: Singly-threading (simple) or multi-threading (robust if done well)

> Now, let us combine this with the second feature (really a bug): GNU shepherd
> is a simple, single-threaded Scheme program.  That means that if the single
> thread enters an infinite loop (because of a Shepherd service description
> that entered an infinite loop), then Shepherd itself hangs.  This means that
> the system is severely broken.  You cannot `sudo reboot` because that 
> communicates
> with the Shepherd daemon.  You cannot even `kill 1` because signals are 
> handled in
> the mainloop, which never get entered if the `start` action of some Shepherd 
> daemon
> managed to enter an infinite loop.

I don't think simplicity is in the ‘design goals’ of shepherd -- it's primarily
used within Guix System (though it can be used outside Guix), which I would not
call simple (-:.

Multi-threading seems complicated (but can be worth it).  What work would you 
put
on which thread (please give us some concrete to reason about, ‘thread X does A,
and is created on $EVENT ...’)?  A complication is that "fork" is ‘unsafe’ in a
process with multiple threads (I don't know the details).  Maybe that could be
worked around.  (E.g. by using system* on a helper binary that closes unneeded
file descriptors and sets the umask .... and eventually uses exec to run the
service binary.)

* Section 2: Scheme too powerful?

> The Shepherd language for describing actions on Shepherd daemons is a 
> Turing-complete Guile language.

There isn't really a ‘Shepherd language’, it's ‘just’ Scheme code and shepherd 
is ‘merely’ a library.
But I guess that's your point.

>   Turing completeness runs afoul of the Principle of Least Power.
>   In principle, all that actions have to do is invoke `exec`, `fork`, `kill`, 
> and `waitpid` syscalls.

That's a bit of an over-simplification.  At least in a singly-threaded Shepherd,
waitpid could not be used by an individual service, as it
(1) returns not often
enough (if a process of another service exits and this action was waiting on
a single pid) (2) too often (likewise,
and this action was ‘just waiting’
(3) potentially never (imagine every process has been started and an action
was waiting for one to exit,
and now the user tries to reboot.  Then shepherd
daemon would never accept the connection from the user!).

And in a multi-threaded Shepherd, "fork" is unsafe.  (Anyway, "fork" captures
too much state, and no action should ever call "exec" without "fork".)  Also,
"exec" replaces *too little* state (e.g. open file descriptors, umask, root
directory, current working directory ...).

But perhaps that's just a call for a different set of primitives (which could
be implemented on top of these system calls by shepherd, or directly on top
of the relevant Hurd RPCs when running on GNU/Hurd.).

>   Yet the language is a full Turing-complete language, including the major
>   weakness of Turing-completeness: the inability to solve the halting problem.

IIUC, there are plans to perform the network configuration inside shepherd
(search for guile-netlink).  So arbitrary code still needs to be allowed.
This is the Procedural (extensible) I'll referring to later.

>  The fact that the halting problem is unsolved in the language means it is
>  possible to trivially write an infinite loop in the language.  In the context
>  of an `init` system, the possibility of an infinite loop is dangerous, as it
>  means the system may never complete bootup.

I'm assuming the hypothetical infinite loop is in, say, the code for starting
guix-daemon which isn't very essential (try "herd stop guix-daemon"!)
(If there's an infinite loop is in e.g. udev-service-type then there's indeed
a problem, then that infinite loop might as well be in eudev itself, so I'm
only speaking of *non-essential* services blocking the boot process.)

I'm not convinced we need Turing-incompleteness here, at least if the
start / do-a-dance action / stop code of each service definition is run in
a separate thread (not currently the case).

> I have experienced this firsthand since I wrote a Shepherd service to launch
> a daemon, and I needed to wait for the daemon initialization to complete.
> My implementation of this had a bug that caused an infinite loop to be 
> entered,
> but would only tickle this bug when a very particular (persistent on-disk) 
> state
> existed.
>
> I wrote this code about a month or so ago, and never got to test it until 
> last week,
> when the bug was tickled.  Unfortunately, by then I had removed older system 
> versions
> that predated the bug.  When I looked at a backup copy of the 
> `configuration.scm`, I
> discovered the bug soon afterwards.
>
> But the amount of code verbiage needed here had apparently overwhelmed me at 
> the time I
> wrote the code to do the waiting, and the bug got into the code and broke my 
> system.
> I had to completely reinstall Guix (fortunately the backup copy of 
> `configuration.scm`
> was helpful in recovering most of the system, also ZFS rocks).

Ok, shepherd should be more robust (at least for non-essential services like 
guix-publish).

> Yes, I made a mistake.  I'm only human.  It should be easy to recover from 
> mistakes.
I would need to know what the mistake and what ‘recovering’ would be exactly in 
this
case, before agreeing or disagreeing whether this is something shepherd should 
help
with.

> The full Turing-completeness of the language invites mistakes,
Agreed (though ‘invites’ seems a little too strong here).
> and the single-threadedness makes the infinite-loop mistakes that 
> Turing-completeness
> invites, into catastrophic system breakages.

Seems like an argument towards making shepherd multi-threaded (or using 
guile-fibers).

* Section 3: A nicely behaving API.

> So what can we do?
> 
> For one, a Turing-complete language is a strict *superset* of 
> non-Turing-complete
> languages.  So one thing we can do is to define a more dedicated language for 
> Shepherd
> actions, strongly encourage the use of that sub-language, and, at some point, 
> require
> that truly Turing-complete actions need to be wrapped in a 
> `(unsafe-turing-complete ...)` form.
> 
> For example, in addition to the current existing `make-forkexec-constructor` 
> and
>  `make-kill-destructor`, let us also add these syntax forms:

> `(wait-for-condition <timeout> <form>)` - Return a procedure that accepts a 
> numeric `pid`, that does: Check if evaluating `<form>` in the lexical context 
> results in `#f`.  If so, wait one second and re-evaluate, but exit anyway and 
> return `#f` if `<timeout>` seconds has passed, or if the `pid` has died.  If 
> `<form>` evaluates to non-`#f` then return it immediately.
> 
> `(timed-action <timeout> <form> ...)` - Return a procedure that accepts a 
> numeric `pid`, that does: Spawn a new thread (or maybe better a `fork`ed 
> process group?).  The new thread evaluates `<form> ...` in sequence.  If the 
> thread completes or throws in `<timeout>` seconds, return the result or throw 
> the exception in the main thread.  IF the `<timeout>` is reached or the
> given `pid` has died, kill the thread and any processes it may have spawned, 
> then return `#f`.
> 
> `(wait-to-die <timeout>)` - Return a procedure that accepts a `pid` that 
> does: Check if the `pid` has died, if so, return #t.  Else sleep 1 second and 
> try again, and if `<timeout>` is reached, return `#f`.

Hard to say if these syntaxes are useful in advance.

In any case, I don't like numeric pids, as they are reused.
Could we use something like pidfds (Linux) or whatever the Hurd has instead?
(Wrapped in nice type in Scheme, maybe named <task> with a predicate task?,
a predicate task-dead?, a wait-for-task-dead operation (if we use 
guile-fibers).)

Note: (perform-operation (choice-operation (wait-for-task-dead task) (sleep 5)))
   == wait until task is dead OR 5 seconds have passed.  Operations in 
guile-fibers
   combine nicely!

Also, sprinkling time-outs everywhere seems rather dubious to me (if that's
what timed-action is for).  If some process is taking long to start,
then by killing it it will take forever to start.

> The above forms should also report as well what they are doing (e.g. `Have 
> been
> waiting 10 of 30 seconds for <form>`) on the console and/or syslog.

Something like that would be useful, yes.  But printing <form> itself seems a 
bit
low-level to me.

> In addition, `make-forkexec-constructor` should accept a `#:post-fork`, which 
> is
>  a procedure that accepts a `pid`, and `make-kill-destructor` should accept a
> `#:pre-kill`, which is a procedure that accepts a `pid`.  Possibly we need to
>  add combinators for the above actions as well.  For example a `sub-actions`
> procedural form that accepts any number of functions of the above `pid -> 
> bool`
> type and creates a single combined `pid -> bool`.

> So for example, in my case, I would have written a `make-forkexec-constructor`
> that accepted a `#:post-fork` that had a `wait-for-condition` on the code that
>  hecked if the daemon initialization completed correctly.
> 
> I think it is a common enough pattern that:
> 
> * We have to spawn a daemon.
> * We have to wait for the daemon to be properly initialized (`#:post-fork`)
> * When shutting down the daemon, it's best to at least try to politely ask it
>   to finish using whatever means the daemon has (`#:pre-kill`).
> * If the daemon doesn't exit soon after we politely asked it to, be less 
> polite and start sending signals.
> 
> So I think the above should cover a good number of necessities.

Agreed.  My own proposal, which I like better (-: :
(this is the ‘procedural’ (extensible) interface)

+ Multi-threading using guile-fibers (we'll use condition variables & channels).
+ Instead of make-forkexec-constructor, we have
  make-process-constructor, as fork + exec as-is is troublesome in
  a multi-threaded application (see POSIX).

  (It still accepts #:user, #:group & the like)

  make-process-constructor returns a thunk as usual,
  but this thunk THUNK works differently:

  THUNK : () -> <process-task>.

  <process-task> is a GOOPS class representing processes (parent: 
<shepherd-task>),
  that do not neccessarily exist yet / anymore.  <process-task>
  has some fields:

    * stopped?-cond
      (condition variable, only signalled after exit-value is set)
    * exit-value
      (an atomic box, value #f or an integer,
       only read after stopped?-cond is signalled
       or whenever the user is feeling impatient and asking shepherd
       how it is going.)
    * stop!-cond (condition variable)
    * stop-forcefully!-cond (condition variable)
    * started-cond (condition variable, only signalled when ‘pid’ has been set)
    * pid (an atomic box)
    * maybe some other condition variables?
    * other atomic boxes?

  THUNK also starts a fiber associated with the returned <process-task>,
  which tries to start the service binary in the background (could also be used
  for other things than the service binary).  When the binary has been started,
  the fiber signals started-cond.

  The fiber will wait for the process to exit, and sets exit-value and
  signals stopped?-cond when it does.  It will also wait for a polite request
  to stop the service binary (stop!-cond) and impolite requests (for SIGKILL?)
  (stop-forcefully!-cond).

  (Does shepherd even have an equivalent to stop-forcefully! ?)

  kill-destructor is adjusted to not work with pids directly, but rather
  with <process-task>.  It signals stop!-cond or stop-forcefully!-cond.

  (So process creation & destruction happens asynchronuously, but there
  still is an easy way to wait until they are actually started.)

+ #:pre-kill & #:post-fork seem rather ad-hoc to me.  I advise
  still wrapping make-process-constructor & kill-destructor.
  However, the thunk still needs to return a <shepherd-task> (or subclass
  of course).

  If the thunk does anything that could block / loop (e.g. if it's a HTTP
  server, then the wrapper might want to wait until the process is actually
  listening at port 80), then this blocking / looping must be done in a fiber.

  Likewise if the killer does anything that could block / loop (e.g.
  first asking the process to exit nicely before killing it after a time-out).

+ It is probably simplest if shepherd's logic for deciding which service to
  start / kill / which dependencies ... ran in a single fiber (though the
  actual starting & killing happens asynchronuously as described).

  To known when a service has been started / stopped, the following GOOPS
  generics must be defined on every <shepherd-task> (not the same as 
<service>!):

  Method (wait-until-stopped-operation (task <shepherd-task>))
    Make a guile-fibers operation for waiting until @var{task} has been stopped.
    This operation returns no values.

  Method (wait-until-task-started-operation (task <shepherd-task>))
    Make a guile-fibers operation for waiting until @var{task} has been started.
    This operation returns no value.

  Method (task-stopped? (task <shepherd-task>))
    Test whether @var{task} has been stopped, returning a generalized boolean.
    It is possible for a task to be stopped even if it has never been started,
    e.g. if the service binary could not be found.

  Method (task-started? (task <shepherd-task>))
    Test whether @var{task} has been started, returning a generalized boolean.
    Callers probably should also check whether the task has been stopped.

  Method (stop-task-asynchronuously! (task <shepherd-task>))
    Request that the task @var{task} be stopped, returning no values.
    This must succeed even if no fiber is handling @var{task} anymore
    (e.g. by signalling a condition variable).  This procedure is
    idempotent: stopping a task twice is the same as stopping a task
    once.

    There is no task start-task-asynchronuously! method, as dead
    tasks cannot be resurrected.

  This ‘service logic fiber’ starts the ‘let's create a process’
  and ‘wait until the process is *really* started’ fibers by calling
  the thunks returned by make-process-constructor & variants.

+ About

> The above forms should also report as well what they are doing (e.g. `Have 
> been
> waiting 10 of 30 seconds for <form>`) on the console and/or syslog.

  The thunks could write to (current-task-info-port) (an output port to be 
defined
  by Shepherd).  task-info-port is a ‘parameter’ (see guile documentation)
  parameterized by Shepherd when running the thunk.  Or something, I'm not 
really
  sure about this part of my proposal)

* Section 4: Static analysis

> Then, we can define a strict subset of Guile, containing a set of forms we 
> know are
> total (i.e. we have a proof they will always halt for all inputs).  Then any 
> Shepherd
> actions, being Lisp code and therefore homoiconic, can be analyzed.  Every 
> list must
> have a `car` that is a symbol naming a syntax or procedure that is known to 
> be safe
> --- `list`, `append`, `cons*`, `cons`, `string-append`, `string-join`, `quote`
> (which also prevents analysis of sub-lists), `and`, `or`, `begin`, thunk 
> combinators,
> as well as the domain-specific `make-forkexec-constructor`, 
> `make-kill-destructor`,
> `wait-for-condition`, `timed-action`, and probably some of the `(guix build 
> utils)`
> like `invoke`, `invoke/quiet`, [...]

`invoke' has no place in your subset, as the invoked program can take 
arbitrarily long.
(Shouldn't be a problem when multi-threading, but then there's less need for 
this kind
of static analysis.)

is not ‘safe’ as the invoked program can 

> `mkdir-p` etc.

I have vague plans to replace 'mkdir-p', 'mkdir-p/perms' etc. with some 
procedure
(prepare-directory
  `("/var/stuff #:bits #o700 #:user ... #:group
     ("blabla" #:bits ...
               #:atomic-generate-contents ,(lambda (port) (display 'stuff 
port))))
     ...))

... automatically taking care of symlinks & detecting whether there are 
conflicts
with different services.


> Sub-forms (or the entire form for an action) can be wrapped in 
> `(unsafe-turing-complete ...)`
> to skip this analysis for the sub-form,
An escape hatch seems good to me, though I would rather call it
`(without-static-analysis (totality) ...)'.  The forms are not neccesarily 
‘unsafe’,
only potentially so.  We could also define a checkers for other issues this way.

(I'm thinking of potential TOCTTOU (time of check to time of use) problems 
involving
symbolic links.)

> but otherwise, by default, the specific subset must be used, and users have to
> explicitly put `(unsafe-turing-complete ...)` so they are aware that they can
> seriously break their system if they make a mistake here.  Ideally, as much of
> the code for any Shepherd action should be *outside* an 
> ``unsafe-turing-complete`,
> and only parts of the code that really need the full Guile language to 
> implement
> should be rapped in `unsafe-turing-complete`.

I'm getting Rusty vibes here.  Sounds sound to me.

> (`lambda` and `let-syntax` and `let`, because they can rebind the meanings of 
> symbols,
>  would need to be in `unsafe-turing-complete` --- otherwise the analysis 
> routine would
>  require a full syntax expander as well)

No.  The analysis routine should not directly work on Scheme code, but rather on
the 'tree-il' (or maybe on the CPS code, I dunno).
Try (macro-expand '(let ((a 0)) a)) from a Guile REPL.

> Turing-completeness is a weakness, not a strength, and restricting languages 
> to be
> the Least Powerful necessary is better.  The `unsafe-turing-complete` form 
> allows
> an escape, but by default Shepherd code should be in the restricted 
> non-Turing-complete
> subset, to reduce the scope for catastrophic mistakes.

I'm assuming make-fork+exec-constructor/container would be defined in Scheme
and be whitelisted in raid5atemyhomework-scheme?

* Section 5 -- Declarative

Something you don't seem to have considered: defining services in a declarative
language!  Hypothetical example à la SystemD, written in something vaguely
resembling Wisp  (Wisp is an alternative read syntax for Scheme with less
parentheses): (I'm a bit rusty with the details on defining shepherd services
in Guix:

shepherd-service
  start `
    I-need-to-think-of-a-name-constructor
      #:binary #$(file-append package "/bin/stuff")
      #:arguments stuff
      #:umask suff
      #:user stuff
      #:group stuff
      ,@(if linux `(#:allowed-syscalls ',(x y z ...)))
      #:friendly-shutdown
        thunk
          invoke #$(file-append package "/bin/stop-stuff-please")
      #:polling-test-started?
        thunk
          if . file-exists? "/run/i-am-really-started"
            #true

            ;; tell I-need-to-think-of.... to wait 4 secs
            ;; before trying again
            seconds 4
      ;; how much time starting may take
      ;; until we consider things failed
      ;; & stop the process
      #:startup-timeout?
        thunk
          if spinning-disk?
             plenty
             little
      #:actions etcetera
      ...

Well, that's technically still procedural, but this starts to look
like a SystemD configuration file due to the many keyword arguments!

(Here (thunk X ...) == (lambda () X ...), and
    I-need-to-think-of-a-name-constructor
is a procedure with very many options, that
‘should be enough for everyone.  If it doesn't support
enough arguments,
look in SystemD for inspiration.)

That seems easier to analyse.  Although it's bit kitchen-sinky,
no kitchen is complete without a sink, so maybe that's ok ...
(I assume the kitchen sink I-need-to-think-of-a-name-constructor
would be defined in the guix source code.)

* Epilogue

Disclaimer: I'm working on other things than shepherd currently,
though I have an unsubmitted ‘fd-mappings’ patch that needs some
more tests & needs to be checked on GNU/Hurd.  This issue is most
likely to move forward is someone actually codes something,
though any discussion is still informative.

> Thanks
> raid5atemyhomework

Greetings,
Maxime
signature.asc
Description: This is a digitally signed message part
[Prev in Thread]
Current Thread
[Next in Thread]
A Critique of Shepherd Design, raid5atemyhomework, 2021/03/19
- Re: A Critique of Shepherd Design, Maxime Devos <=
  - Re: A Critique of Shepherd Design, raid5atemyhomework, 2021/03/19
- Re: A Critique of Shepherd Design, Maxim Cournoyer, 2021/03/20
- Re: A Critique of Shepherd Design, Mark H Weaver, 2021/03/20
  - Re: A Critique of Shepherd Design, raid5atemyhomework, 2021/03/20
- Re: A Critique of Shepherd Design, Ludovic Courtès, 2021/03/20
  - Re: A Critique of Shepherd Design, raid5atemyhomework, 2021/03/20
    - Re: A Critique of Shepherd Design, Ludovic Courtès, 2021/03/22
    - Re: A Critique of Shepherd Design, raid5atemyhomework, 2021/03/24
    - Re: A Critique of Shepherd Design, raid5atemyhomework, 2021/03/24
- Re: A Critique of Shepherd Design, raingloom, 2021/03/22
Prev by Date: Re: [SPITBALL] Jehanne as another kernel option / porting target
Next by Date: Re: Release 1.2.1: status
Previous by thread: A Critique of Shepherd Design
Next by thread: Re: A Critique of Shepherd Design
Index(es):
- Date
- Thread