[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v5 11/13] cpu-exec-common: Introduce async_safe_
From: |
Alex Bennée |
Subject: |
Re: [Qemu-devel] [PATCH v5 11/13] cpu-exec-common: Introduce async_safe_run_on_cpu() |
Date: |
Wed, 03 Aug 2016 22:02:04 +0100 |
User-agent: |
mu4e 0.9.17; emacs 25.1.2 |
Emilio G. Cota <address@hidden> writes:
> On Tue, Aug 02, 2016 at 18:27:42 +0100, Alex Bennée wrote:
>> From: Sergey Fedorov <address@hidden>
>>
>> This patch is based on the ideas found in work of KONRAD Frederic [1],
>> Alex Bennée [2], and Alvise Rigo [3].
>>
>> This mechanism allows to perform an operation safely in a quiescent
>> state. Quiescent state means: (1) no vCPU is running and (2) BQL in
>> system-mode or 'exclusive_lock' in user-mode emulation is held while
>> performing the operation. This functionality is required e.g. for
>> performing translation buffer flush safely in multi-threaded user-mode
>> emulation.
>>
>> The existing CPU work queue is used to schedule such safe operations. A
>> new 'safe' flag is added into struct qemu_work_item to designate the
>> special requirements of the safe work. An operation in a quiescent sate
>
> s/sate/state/
>
> (snip)
>> index a233f01..6d5da15 100644
>> --- a/cpu-exec-common.c
>> +++ b/cpu-exec-common.c
>> @@ -25,6 +25,7 @@
>>
>> bool exit_request;
>> CPUState *tcg_current_cpu;
>> +int tcg_pending_threads;
>>
>> /* exit the current TB, but without causing any exception to be raised */
>> void cpu_loop_exit_noexc(CPUState *cpu)
>> @@ -79,6 +80,35 @@ void cpu_loop_exit_restore(CPUState *cpu, uintptr_t pc)
>> }
>>
>> QemuCond qemu_work_cond;
>> +QemuCond qemu_safe_work_cond;
>> +QemuCond qemu_exclusive_cond;
>> +
>> +static int safe_work_pending;
>> +
>> +#ifdef CONFIG_USER_ONLY
>> +#define can_wait_for_safe() (1)
>> +#else
>> +/*
>> + * We never sleep in SoftMMU emulation because we would deadlock as
>> + * all vCPUs are in the same thread. This will change for MTTCG
>> + * however.
>> + */
>> +#define can_wait_for_safe() (0)
>> +#endif
>> +
>> +void wait_safe_cpu_work(void)
>> +{
>> + while (can_wait_for_safe() && atomic_mb_read(&safe_work_pending) > 0) {
>
> The atomic here is puzzling, see below.
>
>> + /*
>> + * If there is pending safe work and no pending threads we
>> + * need to signal another thread to start its work.
>> + */
>> + if (tcg_pending_threads == 0) {
>> + qemu_cond_signal(&qemu_exclusive_cond);
>> + }
>> + qemu_cond_wait(&qemu_safe_work_cond, qemu_get_cpu_work_mutex());
>> + }
>> +}
>>
>> static void queue_work_on_cpu(CPUState *cpu, struct qemu_work_item *wi)
>> {
>> @@ -91,9 +121,18 @@ static void queue_work_on_cpu(CPUState *cpu, struct
>> qemu_work_item *wi)
>> cpu->queued_work_last = wi;
>> wi->next = NULL;
>> wi->done = false;
>> + if (wi->safe) {
>> + atomic_inc(&safe_work_pending);
>> + }
>
> This doesn't seem right. Operating on the condvar's shared 'state' variable
> should always be done with the condvar's mutex held. Otherwise, there's
> no guarantee that sleepers will always see a consistent state when they're
> woken up, which can easily lead to deadlock.
How so? Surely the barriers around the atomic accesses and the implicit
barriers of the mutexes ensure it is?
>
> I suspect this is what caused the deadlock you saw in the last iteration
> of the series.
>
> An additional requirement is the fact that new CPUs can come anytime in
> user-mode (imagine we're flushing the TB while a new pthread was just
> spawned). This is easily triggered by greatly reducing the size of the
> translation buffer, and spawning dozens of threads.
I don't suppose you have a test case written up for this already?
My kvm-unit-tests are fairly extensive for SoftMMU mode but for
user-mode I was only using pigz with the TB buffer scaled down.
Obviously I need to expand the user-mode testing.
> This patch, as it
> stands, won't catch the new threads coming in, because at the time
> "safe work" was assigned, the new threads might not be seen by
> CPU_FOREACH (btw, the CPU list should be converted to RCU, but a
> ppc machine might be affected, see [1])
That sounds like a separate patch to RCUify.
>
> A possible fix is to sched safe work after exiting the CPU loop, i.e.
> with qemu_get_cpu_work_mutex held. I tried this on v4 of this patchset
> and doesn't scale very well on 64 cores (too much contention
> on tb_lock), although at least it doesn't deadlock.
Where exactly? Surely tb_lock contention shouldn't be a problem as it is
only held for generation and patching now?
> An alternative is to have a separate lock for safe work, and check for
> safe work once there are no other locks held; a good place to do this is
> at the beginning of cpu_loop_exec. This scales better, and I'd argue
> it's simpler. In fact, I posted a patch that does this about a year
> ago (!):
> https://lists.nongnu.org/archive/html/qemu-devel/2015-08/msg02576.html
I'll have another look at this. One thing I prefer about this series is
it keeps all the work mechanisms together. I think that's worth striving
for if we can.
> Paolo didn't like condvars, but now I see them coming up again. I guess
> he still won't like the synchronize_rcu() call in there, and I don't like
> it either, but I don't think that's an essential part of that patch.
>
> Thanks,
>
> Emilio
>
> [1] https://lists.nongnu.org/archive/html/qemu-devel/2015-08/msg02581.html
--
Alex Bennée
- [Qemu-devel] [PATCH v5 12/13] tcg: Make tb_flush() thread safe, (continued)
- [Qemu-devel] [PATCH v5 12/13] tcg: Make tb_flush() thread safe, Alex Bennée, 2016/08/02
- [Qemu-devel] [PATCH v5 10/13] bsd-user: Support CPU work queue, Alex Bennée, 2016/08/02
- [Qemu-devel] [PATCH v5 13/13] cpu-exec: replace cpu->queued_work with GArray, Alex Bennée, 2016/08/02
- [Qemu-devel] [PATCH v5 09/13] linux-user: Support CPU work queue, Alex Bennée, 2016/08/02
- [Qemu-devel] [PATCH v5 11/13] cpu-exec-common: Introduce async_safe_run_on_cpu(), Alex Bennée, 2016/08/02
- Re: [Qemu-devel] [PATCH v5 11/13] cpu-exec-common: Introduce async_safe_run_on_cpu(), Paolo Bonzini, 2016/08/27
- Re: [Qemu-devel] [PATCH v5 11/13] cpu-exec-common: Introduce async_safe_run_on_cpu(), Paolo Bonzini, 2016/08/29
- Re: [Qemu-devel] [PATCH v5 11/13] cpu-exec-common: Introduce async_safe_run_on_cpu(), Alex Bennée, 2016/08/31