[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH RFC] monitor: temporary fix for dead-lock on eve
From: |
Daniel P . Berrangé |
Subject: |
Re: [Qemu-devel] [PATCH RFC] monitor: temporary fix for dead-lock on event recursion |
Date: |
Mon, 30 Jul 2018 14:17:02 +0100 |
User-agent: |
Mutt/1.10.0 (2018-05-17) |
On Mon, Jul 30, 2018 at 03:09:08PM +0200, Marc-André Lureau wrote:
> Hi
>
> On Mon, Jul 30, 2018 at 3:05 PM, Daniel P. Berrangé <address@hidden> wrote:
> > On Mon, Jul 30, 2018 at 02:57:46PM +0200, Marc-André Lureau wrote:
> >> With a Spice port chardev, it is possible to reenter
> >> monitor_qapi_event_queue() (when the client disconnects for
> >> example). This will dead-lock on monitor_lock.
> >>
> >> Instead, use some TLS variables to check for recursion and queue the
> >> events.
> >
> > I wonder if it would be clearer to just change
> > qapi_event_send_spice_disconnected() so that it use g_idle_add() to
> > send the QAPI_EVENT_SPICE_DISCONNECTED event, as I don't see a strong
> > reason to sent that sychronously. This would avoid the recursion problem.
>
> Yes, I seriously thought about that solution. But in the end, the bug
> is in monitor code: it should be fixed there at some point. And it
> could in theory happen with a different code path than Spice.
The question is what we consider the API contract to be for the chardevs
that are used by the monitor. eg is the chardev I/O function permitted to
emit monitor events. Given that chardevs are widely used for lots of
backends, I guess I would probably say chardevs should be allowed to
emit monitor events, which does put the burden on the monitor code to
handle recursion.
I wonder if we could still use g_idle_add() though but in a different
place. The monitor_qapi_event_queue() method already emits events
asynchronously if they are tagged as needing rate limiting. So instead
of directly calling monitor_qapi_event_emit() in the non-rate limited
branch, we could send that call via g_idle_add(), so that all events
are emitted asynchronously. This would avoid monitor_qapi_event_queue()
becoming a re-entrancy problem.
I think g_idle_add() calls are processed in the order in which they
are registered, so event orderin should still be preserved.
> >> Fixes:
> >> (gdb) bt
> >> #0 0x00007fa69e7217fd in __lll_lock_wait () at /lib64/libpthread.so.0
> >> #1 0x00007fa69e71acf4 in pthread_mutex_lock () at /lib64/libpthread.so.0
> >> #2 0x0000563303567619 in qemu_mutex_lock_impl (mutex=0x563303d3e220
> >> <monitor_lock>, file=0x5633036589a8 "/home/elmarco/src/qq/monitor.c",
> >> line=645) at /home/elmarco/src/qq/util/qemu-thread-posix.c:66
> >> #3 0x0000563302fa6c25 in monitor_qapi_event_queue
> >> (event=QAPI_EVENT_SPICE_DISCONNECTED, qdict=0x56330602bde0,
> >> errp=0x7ffc6ab5e728) at /home/elmarco/src/qq/monitor.c:645
> >> #4 0x0000563303549aca in qapi_event_send_spice_disconnected
> >> (server=0x563305afd630, client=0x563305745360, errp=0x563303d8d0f0
> >> <error_abort>) at qapi/qapi-events-ui.c:149
> >> #5 0x00005633033e600f in channel_event (event=3, info=0x5633061b0050) at
> >> /home/elmarco/src/qq/ui/spice-core.c:235
> >> #6 0x00007fa69f6c86bb in reds_handle_channel_event (reds=<optimized
> >> out>, event=3, info=0x5633061b0050) at reds.c:316
> >> #7 0x00007fa69f6b193b in main_dispatcher_self_handle_channel_event
> >> (info=0x5633061b0050, event=3, self=0x563304e088c0) at
> >> main-dispatcher.c:197
> >> #8 0x00007fa69f6b193b in main_dispatcher_channel_event
> >> (self=0x563304e088c0, address@hidden, info=0x5633061b0050) at
> >> main-dispatcher.c:197
> >> #9 0x00007fa69f6d0833 in red_stream_push_channel_event (address@hidden,
> >> address@hidden) at red-stream.c:414
> >> #10 0x00007fa69f6d086b in red_stream_free (s=0x563305ad8f50) at
> >> red-stream.c:388
> >> #11 0x00007fa69f6b7ddc in red_channel_client_finalize
> >> (object=0x563304df2360) at red-channel-client.c:347
> >> #12 0x00007fa6a56b7fb9 in g_object_unref () at /lib64/libgobject-2.0.so.0
> >> #13 0x00007fa69f6ba212 in red_channel_client_push (rcc=0x563304df2360) at
> >> red-channel-client.c:1341
> >> #14 0x00007fa69f68b259 in red_char_device_send_msg_to_client
> >> (client=<optimized out>, msg=0x5633059b6310, dev=0x563304e08bc0) at
> >> char-device.c:305
> >> #15 0x00007fa69f68b259 in red_char_device_send_msg_to_clients
> >> (msg=0x5633059b6310, dev=0x563304e08bc0) at char-device.c:305
> >> #16 0x00007fa69f68b259 in red_char_device_read_from_device
> >> (dev=0x563304e08bc0) at char-device.c:353
> >> #17 0x000056330317d01d in spice_chr_write (chr=0x563304cafe20,
> >> buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763,
> >> \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\":
> >> false}}\r\n", len=111) at /home/elmarco/src/qq/chardev/spice.c:199
> >> #18 0x00005633034deee7 in qemu_chr_write_buffer (s=0x563304cafe20,
> >> buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763,
> >> \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\":
> >> false}}\r\n", len=111, offset=0x7ffc6ab5ea70, write_all=false) at
> >> /home/elmarco/src/qq/chardev/char.c:112
> >> #19 0x00005633034df054 in qemu_chr_write (s=0x563304cafe20,
> >> buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763,
> >> \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\":
> >> false}}\r\n", len=111, write_all=false) at
> >> /home/elmarco/src/qq/chardev/char.c:147
> >> #20 0x00005633034e1e13 in qemu_chr_fe_write (be=0x563304dbb800,
> >> buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763,
> >> \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\":
> >> false}}\r\n", len=111) at /home/elmarco/src/qq/chardev/char-fe.c:42
> >> #21 0x0000563302fa6334 in monitor_flush_locked (mon=0x563304dbb800) at
> >> /home/elmarco/src/qq/monitor.c:425
> >> #22 0x0000563302fa6520 in monitor_puts (mon=0x563304dbb800,
> >> str=0x563305de7e9e "") at /home/elmarco/src/qq/monitor.c:468
> >> #23 0x0000563302fa680c in qmp_send_response (mon=0x563304dbb800,
> >> rsp=0x563304df5730) at /home/elmarco/src/qq/monitor.c:517
> >> #24 0x0000563302fa6905 in qmp_queue_response (mon=0x563304dbb800,
> >> rsp=0x563304df5730) at /home/elmarco/src/qq/monitor.c:538
> >> #25 0x0000563302fa6b5b in monitor_qapi_event_emit
> >> (event=QAPI_EVENT_SHUTDOWN, qdict=0x563304df5730) at
> >> /home/elmarco/src/qq/monitor.c:624
> >> #26 0x0000563302fa6c4b in monitor_qapi_event_queue
> >> (event=QAPI_EVENT_SHUTDOWN, qdict=0x563304df5730, errp=0x7ffc6ab5ed00) at
> >> /home/elmarco/src/qq/monitor.c:649
> >> #27 0x0000563303548cce in qapi_event_send_shutdown (guest=false,
> >> errp=0x563303d8d0f0 <error_abort>) at qapi/qapi-events-run-state.c:58
> >> #28 0x000056330313bcd7 in main_loop_should_exit () at
> >> /home/elmarco/src/qq/vl.c:1822
> >> #29 0x000056330313bde3 in main_loop () at /home/elmarco/src/qq/vl.c:1862
> >> #30 0x0000563303143781 in main (argc=3, argv=0x7ffc6ab5f068,
> >> envp=0x7ffc6ab5f088) at /home/elmarco/src/qq/vl.c:4644
> >>
> >> Note that error report is now moved to the first caller, which may
> >> receive an error for a recursed event. This is probably fine (95% of
> >> callers use &error_abort, the rest have NULL error and ignore it)
> >>
> >> Signed-off-by: Marc-André Lureau <address@hidden>
> >> ---
> >> monitor.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++++++-
> >> 1 file changed, 50 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/monitor.c b/monitor.c
> >> index d8d8211ae4..d580c5a79c 100644
> >> --- a/monitor.c
> >> +++ b/monitor.c
> >> @@ -633,7 +633,7 @@ static void monitor_qapi_event_handler(void *opaque);
> >> * applying any rate limiting if required.
> >> */
> >> static void
> >> -monitor_qapi_event_queue(QAPIEvent event, QDict *qdict, Error **errp)
> >> +monitor_qapi_event_queue_no_recurse(QAPIEvent event, QDict *qdict, Error
> >> **errp)
> >> {
> >> MonitorQAPIEventConf *evconf;
> >> MonitorQAPIEventState *evstate;
> >> @@ -688,6 +688,55 @@ monitor_qapi_event_queue(QAPIEvent event, QDict
> >> *qdict, Error **errp)
> >> qemu_mutex_unlock(&monitor_lock);
> >> }
> >>
> >> +static void
> >> +monitor_qapi_event_queue(QAPIEvent event, QDict *qdict, Error **errp)
> >> +{
> >> + Error *local_err = NULL;
> >> + /*
> >> + * If the function recurse, monitor_lock will dead-lock.
> >> + * Instead, queue pending events in TLS.
> >> + * TODO: remove this, make it re-enter safe.
> >> + */
> >> + static __thread bool recurse;
> >> + typedef struct MonitorQapiEvent {
> >> + QAPIEvent event;
> >> + QDict *qdict;
> >> + QSIMPLEQ_ENTRY(MonitorQapiEvent) entry;
> >> + } MonitorQapiEvent;
> >> + MonitorQapiEvent *ev;
> >> + static __thread QSIMPLEQ_HEAD(, MonitorQapiEvent) event_queue;
> >> +
> >> + if (!recurse) {
> >> + QSIMPLEQ_INIT(&event_queue);
> >> + }
> >> +
> >> + ev = g_new(MonitorQapiEvent, 1);
> >> + ev->qdict = qobject_ref(qdict);
> >> + ev->event = event;
> >> + QSIMPLEQ_INSERT_TAIL(&event_queue, ev, entry);
> >> + if (recurse) {
> >> + return;
> >> + }
> >> +
> >> + recurse = true;
> >> +
> >> + while ((ev = QSIMPLEQ_FIRST(&event_queue)) != NULL) {
> >> + QSIMPLEQ_REMOVE_HEAD(&event_queue, entry);
> >> + if (!local_err) {
> >> + monitor_qapi_event_queue_no_recurse(ev->event, ev->qdict,
> >> + &local_err);
> >> + }
> >> + qobject_unref(ev->qdict);
> >> + g_free(ev);
> >> + }
> >> +
> >> + recurse = false;
> >> +
> >> + if (local_err) {
> >> + error_propagate(errp, local_err);
> >> + }
> >> +}
> >> +
> >> /*
> >> * This function runs evconf->rate ns after sending a throttled
> >> * event.
> >> --
> >> 2.18.0.232.gb7bd9486b0
> >>
> >>
> >
> > Regards,
> > Daniel
> > --
> > |: https://berrange.com -o- https://www.flickr.com/photos/dberrange
> > :|
> > |: https://libvirt.org -o- https://fstop138.berrange.com
> > :|
> > |: https://entangle-photo.org -o- https://www.instagram.com/dberrange
> > :|
Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|