qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] racing between pause_all_vcpus() and qemu_cpu_stop()


From: Alex Bennée
Subject: Re: [Qemu-devel] racing between pause_all_vcpus() and qemu_cpu_stop()
Date: Mon, 01 Oct 2018 19:12:38 +0100
User-agent: mu4e 1.1.0; emacs 26.1.50

Peter Maydell <address@hidden> writes:

> I've been investigating a race condition where sometimes when my
> guest writes to a device register which triggers a
> qemu_system_reset_request(), it doesn't actually cause a clean reset,
> but instead the guest CPU continues to execute instructions.
> I managed to repro it under 'rr', which let me walk through enough
> of what was going on to determine the following:
>
> When a guest CPU thread calls qemu_system_reset_request(), this
> results in a call to qemu_cpu_stop(current_cpu, true), to
> make the CPU come back out to the main loop. We also set the
> reset_requested flag, to get the IO thread to actually do the
> reset.
>
> The main loop thread runs main_loop_should_exit(). If there is a
> pending reset, it calls pause_all_vcpus(), with the intention
> that this quiesces all the guest CPUs before it starts messing
> with reset actions.
>
> pause_all_vcpus() just waits for every cpu to have cpu->stopped set.
> However, if the running cpu has just called qemu_cpu_stop() on
> itself then it will have set cpu->stopped true but not actually
> made it out to the main loop yet. (In the case I'm looking at,
> what happens is that as soon as the CPU thread unlocks the
> iothread mutex in io_writex() after the device write, the
> main thread runs and does all the reset operations.)
>
> The reset code in the iothread then proceeds to start calling
> various reset functions while the CPU thread is still inside
> the exec loop, running generated code and so on. This doesn't
> seem like what ought to happen. In particular it includes
> calling cpu_common_reset(), which clears all kinds of flags
> relevant to the still-executing CPU...

I would have thought the reset code should be scheduled via safe async
work to run in the vCPU context. Why should the main loop get involved
at all here?

>
> Any suggestions for how we should fix this?
>
> thanks
> -- PMM


--
Alex Bennée



reply via email to

[Prev in Thread] Current Thread [Next in Thread]