[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] exec: Safe work in quiescent state

From: Alex Bennée
Subject: Re: [Qemu-devel] exec: Safe work in quiescent state
Date: Wed, 15 Jun 2016 15:56:26 +0100
User-agent: mu4e 0.9.17; emacs

Sergey Fedorov <address@hidden> writes:

> On 10/06/16 00:51, Sergey Fedorov wrote:
>> For certain kinds of tasks we might need a quiescent state to perform an
>> operation safely. Quiescent state means no CPU thread executing, and
>> probably BQL held as well. The tasks could include:
> Considering different attempts to implement similar functionality, I've
> got the following summary.
> Fred's original async_run_safe_work_on_cpu() [1]:
> - resembles async_run_on_cpu();
> - introduces a per-CPU safe work queue, a per-CPU flag to prevent the
> CPU from executing code, and a global counter of pending jobs;
> - implements rather complicated scheduling of jobs relying on both the
> per-CPU flag and the global counter;
> - may be not entirely safe when draining work queues if multiple CPUs
> have scheduled safe work;
> - does not support user-mode emulation.

Just some quick comments for context:

> Alex's reiteration of Fred's approach [2]:
> - maintains a single global safe work queue;

Having separate queues can lead to problems with draining queues as only
queue gets drained at a time and some threads exit more frequently than

> - uses GArray rather than linked list to implement the work queue;

This was to minimise g_malloc on job creation and working through the
list. An awful lot of jobs just need the CPU id and a single parameter.
This is why I made it the simple case.

> - introduces a global counter of CPUs which have entered their execution
> loop;
> - makes use of the last CPU exited its execution loop to drain the safe
> work queue;

I suspect you can still race with other deferred work as those tasks are
being done outside the exec loop. This should be fixable though.

> - still does not support user-mode emulation.

There is not particular reason it couldn't. However it would mean
updating the linux-user cpu_exec loop which most likely needs a good
clean-up and re-factoring to avoid making the change to $ARCH loops.

> Alvise's async_wait_run_on_cpu() [3]:
> - uses the same queue as async_run_on_cpu();
> - the CPU that requested the job is recorded in qemu_work_item;
> - each CPU has a counter of such jobs it has requested;
> - the counter is decremented upon job completion;
> - only the target CPU is forced to exit the execution loop, i.e. the job
> is not run in quiescent state;
> - does not support user-mode emulation.
> Emilio's cpu_tcg_sched_work() [4]:
> - exploits tb_lock() to force CPUs exit their execution loop;
> - requires 'tb_lock' to be held when scheduling a job;
> - allows each CPU to schedule only a single job;
> - handles scheduled work right in cpu_exec();
> - exploits synchronize_rcu() to wait for other CPUs to exit their
> execution loop;
> - implements a complicated synchronization scheme;
> - should support both system and user-mode emulation.
> As of requirements for common safe work mechanism, each use case has its
> own considerations.
> Translation buffer flush just requires that no CPU is executing
> generated code during the operation.
> Cross-CPU TLB flush basically requires no CPU is performing TLB
> lookup/modification. Some architectures might require TLB flush be
> complete before the requesting CPU can continue execution; other might
> allow to delay it until some "synchronization point". In case of ARM,
> one of such synchronization points is DMB instruction. We might allow
> the operation to be performed asynchronously and continue execution, but
> we'd need to end TB and synchronize on each DMB instruction. That
> doesn't seem very efficient. So a simple approach to force the operation
> to complete before executing anything else would probably make sense in
> both cases. Slow-path LL/SC emulation also requires cross-CPU TLB flush
> to be complete before it can finish emulation of a LL instruction.
> Exclusive operation emulation in user-mode basically requires that no
> other CPU is executing generated code. However, I hope that both system
> and user-mode would use some common implementation of exclusive
> instruction emulation.
> It was pointed out that special care must be taken to avoid deadlocks
> [5, 6]. A simple and reliable approach might be to exit all CPU's
> execution loop including the requesting CPU and then serve all the
> pending requests.
> Distilling the requirements, safe work mechanism should:
> - support both system and user-mode emulation;
> - allow to schedule an asynchronous operation to be performed out of CPU
> execution loop;
> - guarantee that all CPUs are out of execution loop before the operation
> can begin;
> - guarantee that no CPU enters execution loop before all the scheduled
> operations are complete.
> If that sounds like a sane approach, I'll come up with a more specific
> solution to discuss. The solution could be merged into v2.7 along with
> safe translation buffer flush in user-mode as an actual use case. Safe
> cross-CPU TLB flush would become a part of MTTCG work. Comments,
> suggestions, arguments etc. are welcome!
> [1] http://thread.gmane.org/gmane.comp.emulators.qemu/355323/focus=355632
> [2] http://thread.gmane.org/gmane.comp.emulators.qemu/407030/focus=407039
> [3] http://thread.gmane.org/gmane.comp.emulators.qemu/413978/focus=413982
> [4] http://thread.gmane.org/gmane.comp.emulators.qemu/356765/focus=356789
> [5] http://thread.gmane.org/gmane.comp.emulators.qemu/397295/focus=397301
> [6] http://thread.gmane.org/gmane.comp.emulators.qemu/413978/focus=417231
> Kind regards,
> Sergey

Alex Bennée

reply via email to

[Prev in Thread] Current Thread [Next in Thread]