[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug#32408] [PATCH shepherd] Allow replacement of services
From: |
Carlo Zancanaro |
Subject: |
[bug#32408] [PATCH shepherd] Allow replacement of services |
Date: |
Tue, 21 Aug 2018 07:16:48 +1000 |
User-agent: |
mu4e 1.0; emacs 26.1 |
Hey Ludo,
On Tue, Aug 21 2018, Ludovic Courtès wrote:
We’ll still want to be able to special-case things like nginx
that can be “hot-replaced”, though. So perhaps, in addition to
this patch on the Shepherd side, we’ll need extra stuff in (gnu
services shepherd).
Yeah, if we expose the replacement field directly, then we'll need
some supporting code in (gnu services shepherd), even if it's just
to detect whether the service is stopped or not before doing a
replacement. Although ideally our interface wouldn't introduce
race conditions like that. (See below for more thoughts on this.)
For instance, the ‘actions’ field of <shepherd-service> could,
by default, include an “upgrade” action that simply sets the
‘replacement’ slot. For nginx, we’d provide a custom “upgrade”
action that does “nginx -s restart” or whatever it is that needs
to be done.
‘guix system reconfigure’ would automatically invoke the
‘upgrade’ action for each new service.
WDYT?
How many services can we meaningfully upgrade like this? My
understanding is that most of our system services a fed an
immutable configuration file, and thus restarting/reloading won't
actually upgrade them. In order to make an upgrade action work the
service would have to mutate itself into a new correct
configuration, as well as restarting/reloading the underlying
daemon. It's even trickier if the daemon itself has been upgraded,
because then the process will have to be restarted anyway.
At any rate, I think the replacement mechanism (this patch) is
just one way that a service can be reloaded. It would probably be
a good idea to create a higher-level abstraction over it. I think
other mechanisms (like a upgrade/reload action) should be handled
on the Guix side of things.
+ (let ((replacement (slot-ref service 'replacement)))
+ (define (copy-slot! slot)
+ (slot-set! service slot (slot-ref replacement slot)))
+ (when replacement
+ (copy-slot! 'provides)
+ (copy-slot! 'requires)
+ (copy-slot! 'respawn?)
+ (copy-slot! 'start)
+ (copy-slot! 'stop)
+ (copy-slot! 'actions)
+ (copy-slot! 'running)
+ (copy-slot! 'docstring))
+ service))
Having a hardcoded list of slots sounds error-prone—surely we’ll
forget to update it down the road. I wonder what else could be
done.
One option would be to grab the block asyncs and atomically
replace the service in the ‘%services’ hash table. Then we only
need to copy the ‘last-respawns’ slot to the new service, I
believe. (This changes the object identity of the service but I
think its OK.)
Another option would be to use GOOPS tricks to iterate over the
list of slots and have a list of slots *not* to copy. I’m not a
big fan of this option, though.
My favourite option for this would be to separate the <service>
object into an immutable <service> and a mutable <service-state>.
The <service-state> object would have a reference to a <service>
object in order to invoke actions on it, and it could also hold a
second <service> object as a replacement. Then the swap would be
much more straightforward. I haven't done any real work towards
this, though.
In the short term, I'd rather replace it in the %services hash
table. I did it by copying slots because I wasn't sure I would get
the details of the swap right and didn't have time to properly
work out how to do it. I'll give it a go!
+(let ((service (lookup-running 'test)))
+ (slot-set! service 'replacement
+ (make <service>
I wonder if we should let users fiddle with ‘replacement’
directly, or if we should provide a higher-level construct.
For instance, ‘register-services’ could transparently set the
‘replacement’ field for services already registered instead of
doing:
(assert (null? (lookup-services (canonical-name new))))
Not sure if there are cases where this behavior would be
undesirable, though.
Thoughts?
With this current patch the replacement field is only checked at
the point when the service is stopped, so the field could only be
set when the service is actually running. I think it makes the
most sense to just replace the service directly if it's not
stopped.
I can't think of any undesirable cases, but having a higher-level
interface is a good idea. At the very least we need to control the
inherent race condition involved in (if running? do-x do-y) for if
the service is stopped after the running? check. At the moment I
think the only thing we have to worry about there is signals, but
if we're going to move to have more parallelism through fibers
then we might need to be even more careful.
I'll try to send through an updated patch later this week.
Carlo
signature.asc
Description: PGP signature