[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#55444: elogind startup race between shepherd and dbus-daemon
From: |
Ludovic Courtès |
Subject: |
bug#55444: elogind startup race between shepherd and dbus-daemon |
Date: |
Fri, 27 May 2022 22:54:49 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) |
Hey there,
Ludovic Courtès <ludo@gnu.org> skribis:
> So it would seem that the solution to this is to prevent dbus-daemon
> from starting elogind. We can do that by changing
> org.freedesktop.login1.service so that it has “Exec=true” instead of
> “Exec=elogind --daemon”.
>
> “Exec=true” is a bit crude because it doesn’t guarantee that elogind is
> really started; if that isn’t good enough, we could instead wait for the
> PID file or something (as of Shepherd 0.9.0, invoking ‘herd start
> elogind’ potentially leads shepherd to start a second instance if the
> first one is still being started, so we can’t really do that).
The patch below address that: it changes the “Exec=” line of
‘org.freedesktop.login1’ to refer to a wrapper. That wrapper connects
to shepherd and waits until ‘elogind’ is started.
That way, if dbus-daemon comes first, it won’t actually launch anything
and instead wait for the Shepherd ‘elogind’ service to be up. (And if
it comes second, dbus-daemon won’t try to launch anything, so no
spurious “already running” messages.)
I tested it in a ‘desktop.tmpl’ VM, quickly logging in on tty1. On
/var/log/messages, you can see the “Activating ….login1” message from
dbus-daemon, followed by “Service elogind started” from shepherd,
followed by “Successfully activated ….login1” from dbus-daemon.
The “elogind” system test passes too.
Thoughts? Objections?
Ludo’.
>From 7ef63d7426677961afd2bd937af19b08209c5b70 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ludovic=20Court=C3=A8s?= <ludo@gnu.org>
Date: Fri, 27 May 2022 22:41:55 +0200
Subject: [PATCH] services: elogind: When started by dbus-daemon, wait for the
Shepherd service.
Fixes <https://issues.guix.gnu.org/55444>.
Previously shepherd and dbus-daemon would race to start elogind. In
some cases (for instance if one logs in quickly enough on the tty),
dbus-daemon would "win" and start elogind before shepherd has had a
chance to do it. Consequently, shepherd would fail to start elogind and
mark it as stopped and disabled, in turn preventing services that depend
on it such as 'xorg-server' from starting.
* gnu/services/desktop.scm (elogind-dbus-service): Rewrite to refer to a
wrapper that waits for the 'elogind' Shepherd service.
---
gnu/services/desktop.scm | 79 ++++++++++++++++++++++++++++++++++++++--
1 file changed, 75 insertions(+), 4 deletions(-)
diff --git a/gnu/services/desktop.scm b/gnu/services/desktop.scm
index 24fd43a207..318107a2ca 100644
--- a/gnu/services/desktop.scm
+++ b/gnu/services/desktop.scm
@@ -1075,10 +1075,81 @@ (define-syntax-rule (ini-file config file clause ...)
("HybridSleepMode" (sleep-list elogind-hybrid-sleep-mode))))
(define (elogind-dbus-service config)
- (list (wrapped-dbus-service (elogind-package config)
- "libexec/elogind/elogind"
- `(("ELOGIND_CONF_FILE"
- ,(elogind-configuration-file config))))))
+ "Return a @file{org.freedesktop.login1.service} file that tells D-Bus how to
+\"start\" elogind. In practice though, our elogind is started when booting by
+shepherd. Thus, the @code{Exec} line of this @file{.service} file does not
+explain how to start elogind; instead, it spawns a wrapper that waits for the
+@code{elogind} shepherd service. This avoids a race condition where both
+@command{shepherd} and @command{dbus-daemon} would attempt to start elogind."
+ ;; For more info on the elogind startup race, see
+ ;; <https://issues.guix.gnu.org/55444>.
+
+ (define elogind
+ (elogind-package config))
+
+ (define wrapper
+ (program-file "elogind-dbus-shepherd-sync"
+ (with-imported-modules '((gnu services herd))
+ #~(begin
+ (use-modules (gnu services herd)
+ (srfi srfi-1)
+ (ice-9 match))
+
+ (define (elogind-service? service)
+ (memq 'elogind (live-service-provision service)))
+
+ (define max-attempts
+ ;; Number of attempts before assuming elogind failed
+ ;; to start.
+ 20)
+
+ ;; Repeatedly check whether the 'elogind' shepherd
+ ;; service is up and running. (As of Shepherd 0.9.1,
+ ;; we cannot just call the 'start' method and wait for
+ ;; it: it would spawn an additional elogind process.)
+ (let loop ((attempts 0))
+ (define services
+ (current-services))
+
+ (when (>= attempts max-attempts)
+ (format (current-error-port)
+ "elogind shepherd service not started~%")
+ (exit 2))
+
+ (match (find elogind-service? services)
+ (#f
+ (format (current-error-port)
+ "no elogind shepherd service~%")
+ (exit 1))
+ (service
+ (unless (live-service-running service)
+ (sleep 1)
+ (loop (+ attempts 1))))))))))
+
+ (define build
+ (with-imported-modules '((guix build utils))
+ #~(begin
+ (use-modules (guix build utils)
+ (ice-9 match))
+
+ (define service-directory
+ "/share/dbus-1/system-services")
+
+ (mkdir-p (dirname (string-append #$output service-directory)))
+ (copy-recursively (string-append #$elogind service-directory)
+ (string-append #$output service-directory))
+ (symlink (string-append #$elogind "/etc") ;for etc/dbus-1
+ (string-append #$output "/etc"))
+
+ ;; Replace the "Exec=" line of the 'org.freedesktop.login1.service'
+ ;; file with one that refers to WRAPPER instead of elogind.
+ (match (find-files #$output "\\.service$")
+ ((file)
+ (substitute* file
+ (("Exec[[:blank:]]*=.*" _)
+ (string-append "Exec=" #$wrapper "\n"))))))))
+
+ (list (computed-file "elogind-dbus-service-wrapper" build)))
(define (pam-extension-procedure config)
"Return an extension for PAM-ROOT-SERVICE-TYPE that ensures that all the PAM
--
2.36.0