bug-guix
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#52051: [core-updates-frozen] cannot login ('org.freedesktop.login1'


From: Ludovic Courtès
Subject: bug#52051: [core-updates-frozen] cannot login ('org.freedesktop.login1' service times out)
Date: Thu, 09 Dec 2021 23:15:10 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)

Hello!

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

> 374   connect(11, {sa_family=AF_UNIX, 
> sun_path="/var/run/dbus/system_bus_socket"}, 34) = 0

[...]

> 374   epoll_wait(5, [{events=EPOLLIN|EPOLLOUT|EPOLLHUP, data={u32=24802800, 
> u64=24802800}}], 20, -1) = 1
> 374   sendmsg(11, {msg_name=NULL, msg_namelen=0, 
> msg_iov=[{iov_base="l\1\0\1\0\0\0\0\1\0\0\0m\0\0\0\1\1o\0\25\0\0\0/org/freedesktop/DBus\0\0\0\3\1s\0\5\0\0\0Hello\0\0\0\2\1s\0\24\0\0\0org.freedesktop.DBus\0\0\0\0\6\1s\0\24\0\0\0org.freedesktop.DBus\0\0\0\0",
>  iov_len=128}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 
> MSG_DONTWAIT|MSG_NOSIGNAL) = -1 EPIPE (Broken pipe)
> 374   gettid()                          = 374
> 374   epoll_ctl(5, EPOLL_CTL_MOD, 11, {events=0, data={u32=24802800, 
> u64=24802800}}) = 0
> 374   timerfd_settime(12, TFD_TIMER_ABSTIME, {it_interval={tv_sec=0, 
> tv_nsec=0}, it_value={tv_sec=0, tv_nsec=1}}, NULL) = 0
> 374   epoll_wait(5, [{events=EPOLLHUP, data={u32=24802800, u64=24802800}}, 
> {events=EPOLLIN, data={u32=24764384, u64=24764384}}], 20, -1) = 2
> 374   read(12, "\1\0\0\0\0\0\0\0", 8)   = 8
> 374   gettid()                          = 374
> 374   timerfd_settime(12, TFD_TIMER_ABSTIME, {it_interval={tv_sec=0, 
> tv_nsec=0}, it_value={tv_sec=0, tv_nsec=1}}, NULL) = 0
> 374   epoll_wait(5, [{events=EPOLLHUP, data={u32=24802800, u64=24802800}}, 
> {events=EPOLLIN, data={u32=24764384, u64=24764384}}], 20, -1) = 2
> 374   read(12, "\1\0\0\0\0\0\0\0", 8)   = 8
> 374   gettid()                          = 374
> 374   timerfd_settime(12, TFD_TIMER_ABSTIME, {it_interval={tv_sec=0, 
> tv_nsec=0}, it_value={tv_sec=0, tv_nsec=1}}, NULL) = 0
> 374   epoll_wait(5, [{events=EPOLLHUP, data={u32=24802800, u64=24802800}}, 
> {events=EPOLLIN, data={u32=24764384, u64=24764384}}], 20, -1) = 2
> 374   read(12, "\1\0\0\0\0\0\0\0", 8)   = 8
> 374   epoll_ctl(5, EPOLL_CTL_DEL, 11, NULL) = 0
> 374   close(11)                         = 0
> 374   gettid()                          = 374
> 374   epoll_wait(5,  <unfinished ...>
> 391   <... close resumed>)              = 0
> 391   madvise(0x7fd6c83dc000, 8368128, MADV_DONTNEED) = 0
> 391   exit(0)                           = ?
> 391   +++ exited with 0 +++
> 374   <... epoll_wait resumed>[{events=EPOLLERR, data={u32=24768000, 
> u64=24768000}}], 17, -1) = 1
> 374   lseek(7, 0, SEEK_SET)             = 0
> 374   read(7, "tty7\n", 63)             = 5

As you pointed out on IRC, the initially ‘Hello’ method call above leads
to EPIPE, and we can see that elogind eventually closes its socket to
dbus-daemon *but* keeps doing its thing.

Some interesting things to note…

First, to my surprise, elogind does not use the client library of the
‘dbus’ package:

--8<---------------cut here---------------start------------->8---
$ guix gc --references $(./pre-inst-env guix build elogind)|grep dbus
$ echo $?
1
--8<---------------cut here---------------end--------------->8---

(This is already the case in ‘master’ with v243.7.)  Instead, it has its
own implementation of the DBus protocol, in C, from systemd—we can’t
have enough sources of bugs and vulnerabilities.

Anyway, the “Hello” message is sent to the system bus asynchronously in
‘sd-bus.c’:

--8<---------------cut here---------------start------------->8---
static int bus_send_hello(sd_bus *bus) {
        _cleanup_(sd_bus_message_unrefp) sd_bus_message *m = NULL;
        int r;

        assert(bus);

        if (!bus->bus_client)
                return 0;

        r = sd_bus_message_new_method_call(
                        bus,
                        &m,
                        "org.freedesktop.DBus",
                        "/org/freedesktop/DBus",
                        "org.freedesktop.DBus",
                        "Hello");
        if (r < 0)
                return r;

        return sd_bus_call_async(bus, NULL, m, hello_callback, NULL, 0);
}
--8<---------------cut here---------------end--------------->8---

A callback is called when a reply is received or an error arises:

--8<---------------cut here---------------start------------->8---
static int hello_callback(sd_bus_message *reply, void *userdata, sd_bus_error 
*error) {

[...]

fail:
        /* When Hello() failed, let's propagate this in two ways: first we 
return the error immediately here,
         * which is the propagated up towards the event loop. Let's also 
invalidate the connection, so that
         * if the user then calls back into us again we won't wait any longer. 
*/

        bus_set_state(bus, BUS_CLOSING);
        return r;
}
--8<---------------cut here---------------end--------------->8---

It’s not clear from that whether the authors intended for the thing to
keep going in case of failure.  In our case it’s not helpful.

But why does dbus-daemon drop the connection in the first place?

To know that, we could change ‘dbus-root-service-type’ to run
dbus-daemon from a ‘--enable-verbose-mode’ build, and with the
‘DBUS_VERBOSE’ environment set to 1.

Looking at ‘dbus-server-socket.c’ it would seem that the only sensible
way this can happen is if dbus-daemon doesn’t yet have a
‘new_connection_function’ set at the time it accepts the incoming
connection:

--8<---------------cut here---------------start------------->8---
static dbus_bool_t
handle_new_client_fd_and_unlock (DBusServer *server,
                                 DBusSocket  client_fd)
{

[...]

  /* See if someone wants to handle this new connection, self-referencing
   * for paranoia.
   */
  new_connection_function = server->new_connection_function;
  new_connection_data = server->new_connection_data;

  _dbus_server_ref_unlocked (server);
  SERVER_UNLOCK (server);

  if (new_connection_function)
    {
      (* new_connection_function) (server, connection,
                                   new_connection_data);
    }
  dbus_server_unref (server);

  /* If no one grabbed a reference, the connection will die. */
  _dbus_connection_close_if_only_one_ref (connection);
  dbus_connection_unref (connection);
--8<---------------cut here---------------end--------------->8---

We know that elogind is started after dbus-daemon has written its PID
file (there’s a Shepherd service dependency).  Is there a possibility
that dbus-daemon writes its PID file before it has set
‘new_connection_function’?

It would seem that ‘bus_context_new’ writes the PID file after it has
called ‘dbus_server_set_new_connection_function’ via ‘setup_server’ via
‘process_config_first_time_only’.

So not sure what happens.

That’s it for today.  Thoughts?

Ludo’.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]