[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#52051: [core-updates-frozen] cannot login ('org.freedesktop.login1'
From: |
Ludovic Courtès |
Subject: |
bug#52051: [core-updates-frozen] cannot login ('org.freedesktop.login1' service times out) |
Date: |
Thu, 09 Dec 2021 23:15:10 +0100 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) |
Hello!
Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
> 374 connect(11, {sa_family=AF_UNIX,
> sun_path="/var/run/dbus/system_bus_socket"}, 34) = 0
[...]
> 374 epoll_wait(5, [{events=EPOLLIN|EPOLLOUT|EPOLLHUP, data={u32=24802800,
> u64=24802800}}], 20, -1) = 1
> 374 sendmsg(11, {msg_name=NULL, msg_namelen=0,
> msg_iov=[{iov_base="l\1\0\1\0\0\0\0\1\0\0\0m\0\0\0\1\1o\0\25\0\0\0/org/freedesktop/DBus\0\0\0\3\1s\0\5\0\0\0Hello\0\0\0\2\1s\0\24\0\0\0org.freedesktop.DBus\0\0\0\0\6\1s\0\24\0\0\0org.freedesktop.DBus\0\0\0\0",
> iov_len=128}], msg_iovlen=1, msg_controllen=0, msg_flags=0},
> MSG_DONTWAIT|MSG_NOSIGNAL) = -1 EPIPE (Broken pipe)
> 374 gettid() = 374
> 374 epoll_ctl(5, EPOLL_CTL_MOD, 11, {events=0, data={u32=24802800,
> u64=24802800}}) = 0
> 374 timerfd_settime(12, TFD_TIMER_ABSTIME, {it_interval={tv_sec=0,
> tv_nsec=0}, it_value={tv_sec=0, tv_nsec=1}}, NULL) = 0
> 374 epoll_wait(5, [{events=EPOLLHUP, data={u32=24802800, u64=24802800}},
> {events=EPOLLIN, data={u32=24764384, u64=24764384}}], 20, -1) = 2
> 374 read(12, "\1\0\0\0\0\0\0\0", 8) = 8
> 374 gettid() = 374
> 374 timerfd_settime(12, TFD_TIMER_ABSTIME, {it_interval={tv_sec=0,
> tv_nsec=0}, it_value={tv_sec=0, tv_nsec=1}}, NULL) = 0
> 374 epoll_wait(5, [{events=EPOLLHUP, data={u32=24802800, u64=24802800}},
> {events=EPOLLIN, data={u32=24764384, u64=24764384}}], 20, -1) = 2
> 374 read(12, "\1\0\0\0\0\0\0\0", 8) = 8
> 374 gettid() = 374
> 374 timerfd_settime(12, TFD_TIMER_ABSTIME, {it_interval={tv_sec=0,
> tv_nsec=0}, it_value={tv_sec=0, tv_nsec=1}}, NULL) = 0
> 374 epoll_wait(5, [{events=EPOLLHUP, data={u32=24802800, u64=24802800}},
> {events=EPOLLIN, data={u32=24764384, u64=24764384}}], 20, -1) = 2
> 374 read(12, "\1\0\0\0\0\0\0\0", 8) = 8
> 374 epoll_ctl(5, EPOLL_CTL_DEL, 11, NULL) = 0
> 374 close(11) = 0
> 374 gettid() = 374
> 374 epoll_wait(5, <unfinished ...>
> 391 <... close resumed>) = 0
> 391 madvise(0x7fd6c83dc000, 8368128, MADV_DONTNEED) = 0
> 391 exit(0) = ?
> 391 +++ exited with 0 +++
> 374 <... epoll_wait resumed>[{events=EPOLLERR, data={u32=24768000,
> u64=24768000}}], 17, -1) = 1
> 374 lseek(7, 0, SEEK_SET) = 0
> 374 read(7, "tty7\n", 63) = 5
As you pointed out on IRC, the initially ‘Hello’ method call above leads
to EPIPE, and we can see that elogind eventually closes its socket to
dbus-daemon *but* keeps doing its thing.
Some interesting things to note…
First, to my surprise, elogind does not use the client library of the
‘dbus’ package:
--8<---------------cut here---------------start------------->8---
$ guix gc --references $(./pre-inst-env guix build elogind)|grep dbus
$ echo $?
1
--8<---------------cut here---------------end--------------->8---
(This is already the case in ‘master’ with v243.7.) Instead, it has its
own implementation of the DBus protocol, in C, from systemd—we can’t
have enough sources of bugs and vulnerabilities.
Anyway, the “Hello” message is sent to the system bus asynchronously in
‘sd-bus.c’:
--8<---------------cut here---------------start------------->8---
static int bus_send_hello(sd_bus *bus) {
_cleanup_(sd_bus_message_unrefp) sd_bus_message *m = NULL;
int r;
assert(bus);
if (!bus->bus_client)
return 0;
r = sd_bus_message_new_method_call(
bus,
&m,
"org.freedesktop.DBus",
"/org/freedesktop/DBus",
"org.freedesktop.DBus",
"Hello");
if (r < 0)
return r;
return sd_bus_call_async(bus, NULL, m, hello_callback, NULL, 0);
}
--8<---------------cut here---------------end--------------->8---
A callback is called when a reply is received or an error arises:
--8<---------------cut here---------------start------------->8---
static int hello_callback(sd_bus_message *reply, void *userdata, sd_bus_error
*error) {
[...]
fail:
/* When Hello() failed, let's propagate this in two ways: first we
return the error immediately here,
* which is the propagated up towards the event loop. Let's also
invalidate the connection, so that
* if the user then calls back into us again we won't wait any longer.
*/
bus_set_state(bus, BUS_CLOSING);
return r;
}
--8<---------------cut here---------------end--------------->8---
It’s not clear from that whether the authors intended for the thing to
keep going in case of failure. In our case it’s not helpful.
But why does dbus-daemon drop the connection in the first place?
To know that, we could change ‘dbus-root-service-type’ to run
dbus-daemon from a ‘--enable-verbose-mode’ build, and with the
‘DBUS_VERBOSE’ environment set to 1.
Looking at ‘dbus-server-socket.c’ it would seem that the only sensible
way this can happen is if dbus-daemon doesn’t yet have a
‘new_connection_function’ set at the time it accepts the incoming
connection:
--8<---------------cut here---------------start------------->8---
static dbus_bool_t
handle_new_client_fd_and_unlock (DBusServer *server,
DBusSocket client_fd)
{
[...]
/* See if someone wants to handle this new connection, self-referencing
* for paranoia.
*/
new_connection_function = server->new_connection_function;
new_connection_data = server->new_connection_data;
_dbus_server_ref_unlocked (server);
SERVER_UNLOCK (server);
if (new_connection_function)
{
(* new_connection_function) (server, connection,
new_connection_data);
}
dbus_server_unref (server);
/* If no one grabbed a reference, the connection will die. */
_dbus_connection_close_if_only_one_ref (connection);
dbus_connection_unref (connection);
--8<---------------cut here---------------end--------------->8---
We know that elogind is started after dbus-daemon has written its PID
file (there’s a Shepherd service dependency). Is there a possibility
that dbus-daemon writes its PID file before it has set
‘new_connection_function’?
It would seem that ‘bus_context_new’ writes the PID file after it has
called ‘dbus_server_set_new_connection_function’ via ‘setup_server’ via
‘process_config_first_time_only’.
So not sure what happens.
That’s it for today. Thoughts?
Ludo’.
- bug#52051: [core-updates-frozen] cannot login ('org.freedesktop.login1' service times out), Maxim Cournoyer, 2021/12/07
- bug#52051: [core-updates-frozen] cannot login ('org.freedesktop.login1' service times out), Maxim Cournoyer, 2021/12/07
- bug#52051: [core-updates-frozen] cannot login ('org.freedesktop.login1' service times out), Ludovic Courtès, 2021/12/08
- bug#52051: [core-updates-frozen] cannot login ('org.freedesktop.login1' service times out), Maxim Cournoyer, 2021/12/09
- bug#52051: [core-updates-frozen] cannot login ('org.freedesktop.login1' service times out),
Ludovic Courtès <=
- bug#52051: [core-updates-frozen] cannot login ('org.freedesktop.login1' service times out), Maxim Cournoyer, 2021/12/10
- bug#52051: [core-updates-frozen] cannot login ('org.freedesktop.login1' service times out), Ludovic Courtès, 2021/12/10
- bug#52051: [core-updates-frozen] cannot login ('org.freedesktop.login1' service times out), Josselin Poiret, 2021/12/16
- bug#52051: [core-updates-frozen] cannot login ('org.freedesktop.login1' service times out), Timothy Sample, 2021/12/21
- bug#52051: [core-updates-frozen] cannot login ('org.freedesktop.login1' service times out), Leo Famulari, 2021/12/21
- bug#52051: [core-updates-frozen] cannot login ('org.freedesktop.login1' service times out), Timothy Sample, 2021/12/21
- bug#52051: [core-updates-frozen] cannot login ('org.freedesktop.login1' service times out), Maxime Devos, 2021/12/21
- bug#52051: [core-updates-frozen] cannot login ('org.freedesktop.login1' service times out), Leo Famulari, 2021/12/21
- bug#52051: [core-updates-frozen] cannot login ('org.freedesktop.login1' service times out), Maxime Devos, 2021/12/21
- bug#52051: [core-updates-frozen] cannot login ('org.freedesktop.login1' service times out), Leo Famulari, 2021/12/21