bug-guix
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#56674: [Shepherd] Use of ‘waitpid’, ‘system*’, etc. in service cod


From: Maxime Devos
Subject: bug#56674: [Shepherd] Use of ‘waitpid’, ‘system*’, etc. in service code can cause deadlocks
Date: Thu, 21 Jul 2022 01:48:02 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0


On 20-07-2022 23:39, Ludovic Courtès wrote:
Hi!

We’ve just had a bad experience with the nginx service on berlin, where
‘herd restart nginx’ would cause shepherd to get stuck forever in
‘waitpid’ on the process that was supposed to start nginx.

The details are unclear, but one thing is clear is that using ‘waitpid’
(either directly or indirectly with ‘system*’, which is what
‘nginx-service-type’ does) is not great:

   1. In the best case, shepherd (as of 0.9.1) is stuck while ‘system*’
      is in ‘waitpid’ waiting for child process completion (“stuck” as
      in: doesn’t do anything, not even answering ‘herd’ requests or
      inetd connections.)

   2. I don’t think that can happen with ‘system*’ (because it’s in C),
      but generally speaking, there’s a possibility that shepherd’s event
      loop will handle child process termination before some other
      user-made ‘waitpid’ call does.

Anyway, that’s a bad situation.

So I can think of several ways to address it:

   1. Change the nginx service ‘stop’ method to just
      (make-kill-destructor), which should work just as well as invoking
      “nginx -s stop”.

   2. Have Shepherd provide a replacement for ‘system*’.
Why Shepherd and not guile fibers? Is this a Shepherd-specific problem?

Thoughts?

3. Make waitpid (or a variant that does what we need) interact well with guile-fibers, like how 'accept' is doesn't inhibit switching to another fiber. There some Linux API with signal handlers or pid fds or such that might be useful here, though I don't recall the name. Presumably something similar can be done for the Hurd, though some C glue may be needed to access the right Hurd APIs if the signal handler API isn't portable.

Alternatively:

4. Do the waitpid in a separate thread (needs work-around for the multi-threaded fork problem, probably C things? Or modifying Guile and maybe glibc to avoid async-unsafe things or make more things async-safe or whatever the appropriate ...-safe is here.)

If not a Guile Fibers interaction problem, then the asynchronous signal handler API might still be useful.

Greetings,
Maxime

Attachment: OpenPGP_0x49E3EE22191725EE.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]