Re: [PATCH v2] monitor/qmp: fix race on CHR_EVENT

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2] monitor/qmp: fix race on CHR_EVENT_CLOSED without OOB

From:	Markus Armbruster
Subject:	Re: [PATCH v2] monitor/qmp: fix race on CHR_EVENT_CLOSED without OOB
Date:	Thu, 08 Apr 2021 14:49:02 +0200
User-agent:	Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)

Kevin Wolf <kwolf@redhat.com> writes:

> Am 08.04.2021 um 11:21 hat Markus Armbruster geschrieben:
>> Kevin Wolf <kwolf@redhat.com> writes:
>> 
>> > Am 22.03.2021 um 16:40 hat Stefan Reiter geschrieben:
>> >> The QMP dispatcher coroutine holds the qmp_queue_lock over a yield
>> >> point, where it expects to be rescheduled from the main context. If a
>> >> CHR_EVENT_CLOSED event is received just then, it can race and block the
>> >> main thread on the mutex in monitor_qmp_cleanup_queue_and_resume.
>> >> 
>> >> monitor_resume does not need to be called from main context, so we can
>> >> call it immediately after popping a request from the queue, which allows
>> >> us to drop the qmp_queue_lock mutex before yielding.
>> >> 
>> >> Suggested-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
>> >> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
>> >> ---
>> >> v2:
>> >> * different approach: move everything that needs the qmp_queue_lock mutex 
>> >> before
>> >>   the yield point, instead of moving the event handling to a different 
>> >> context
>> >
>> > The interesting new case here seems to be that new requests could be
>> > queued and the dispatcher coroutine could be kicked before yielding.
>> > This is safe because &qmp_dispatcher_co_busy is accessed with atomics
>> > on both sides.
>> >
>> > The important part is just that the first (conditional) yield stays
>> > first, so that the aio_co_wake() in handle_qmp_command() won't reenter
>> > the coroutine while it is expecting to be reentered from somewhere else.
>> > This is still the case after the patch.
>> >
>> > Reviewed-by: Kevin Wolf <kwolf@redhat.com>
>> 
>> Thanks for saving me from an ugly review headache.
>> 
>> Should this go into 6.0?
>
> This is something that the responsible maintainer needs to decide.

Yes, and that's me.  I'm soliciting opinions.

> If it helps you with the decision, and if I understand correctly, it is
> a regression from 5.1, but was already broken in 5.2.

It helps.

Even more helpful would be a risk assessment: what's the risk of
applying this patch now vs. delaying it?

If I understand Stefan correctly, Proxmox observed VM hangs.  How
frequent are these hangs?  Did they result in data corruption?

How confident do we feel about the fix?

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [PATCH v2] monitor/qmp: fix race on CHR_EVENT_CLOSED without OOB, Kevin Wolf, 2021/04/07
- Re: [PATCH v2] monitor/qmp: fix race on CHR_EVENT_CLOSED without OOB, Markus Armbruster, 2021/04/08
  - Re: [PATCH v2] monitor/qmp: fix race on CHR_EVENT_CLOSED without OOB, Kevin Wolf, 2021/04/08
    - Re: [PATCH v2] monitor/qmp: fix race on CHR_EVENT_CLOSED without OOB, Markus Armbruster <=
    - Re: [PATCH v2] monitor/qmp: fix race on CHR_EVENT_CLOSED without OOB, Thomas Lamprecht, 2021/04/08
    - Re: [PATCH v2] monitor/qmp: fix race on CHR_EVENT_CLOSED without OOB, Markus Armbruster, 2021/04/08
- Re: [PATCH v2] monitor/qmp: fix race on CHR_EVENT_CLOSED without OOB, Markus Armbruster, 2021/04/09

Prev by Date: Re: [PATCH v4 00/12] target/arm mte fixes
Next by Date: Re: [PATCH v1 4/8] target/riscv: Add ePMP CSR access functions
Previous by thread: Re: [PATCH v2] monitor/qmp: fix race on CHR_EVENT_CLOSED without OOB
Next by thread: Re: [PATCH v2] monitor/qmp: fix race on CHR_EVENT_CLOSED without OOB
Index(es):
- Date
- Thread