Re: [PATCH] accel/tcg: fix race in cpu_exec_step

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] accel/tcg: fix race in cpu_exec_step_atomic (bug 1863025)

From:	Yifan Lu
Subject:	Re: [PATCH] accel/tcg: fix race in cpu_exec_step_atomic (bug 1863025)
Date:	Fri, 14 Feb 2020 16:01:17 -0800

What race are you thinking of in my patch? The obvious race I can
think of is benign:

Case 1:
A: does TB flush
B: read tb_flush_count
A: increment tb_flush_count
A: end_exclusive
B: tb_lookup__cpu_state/tb_gen_code
B: start_exclusive
B: read tb_flush_count again (increment seen)
B: retries

Case 2:
B: read tb_flush_count
A: does TB flush
A: increment tb_flush_count
A: end_exclusive
B: tb_lookup__cpu_state/tb_gen_code
B: start_exclusive
B: read tb_flush_count again (increment seen)
B: retries

Case 3:
A: does TB flush
A: increment tb_flush_count
A: end_exclusive
B: read tb_flush_count
B: tb_lookup__cpu_state/tb_gen_code
B: start_exclusive
B: read tb_flush_count again (no increment seen)
B: proceeds

Case 1 is the expected case. Case 2, we thought TB was stale but it
wasn't so we get it again with tb_lookup__cpu_state with minimal extra
overhead.

Case 3 seems to be bad because we could read tb_flush_count and find
it already incremented. But if so that means thread A is at the end of
do_tb_flush and the lookup tables are already cleared and the TCG
context is already reset. So it should be safe for thread B to call
tb_lookup__cpu_state or tb_gen_code.

Yifan

On Fri, Feb 14, 2020 at 3:31 PM Richard Henderson
<address@hidden> wrote:
>
> On 2/14/20 6:49 AM, Alex Bennée wrote:
> > The bug describes a race whereby cpu_exec_step_atomic can acquire a TB
> > which is invalidated by a tb_flush before we execute it. This doesn't
> > affect the other cpu_exec modes as a tb_flush by it's nature can only
> > occur on a quiescent system. The race was described as:
> >
> >   B2. tcg_cpu_exec => cpu_exec => tb_find => tb_gen_code
> >   B3. tcg_tb_alloc obtains a new TB
> >
> >       C3. TB obtained with tb_lookup__cpu_state or tb_gen_code
> >           (same TB as B2)
> >
> >           A3. start_exclusive critical section entered
> >           A4. do_tb_flush is called, TB memory freed/re-allocated
> >           A5. end_exclusive exits critical section
> >
> >   B2. tcg_cpu_exec => cpu_exec => tb_find => tb_gen_code
> >   B3. tcg_tb_alloc reallocates TB from B2
> >
> >       C4. start_exclusive critical section entered
> >       C5. cpu_tb_exec executes the TB code that was free in A4
> >
> > The simplest fix is to widen the exclusive period to include the TB
> > lookup. As a result we can drop the complication of checking we are in
> > the exclusive region before we end it.
>
> I'm not 100% keen on having the tb_gen_code within the exclusive region.  It
> implies a much larger delay on (at least) the first execution of the atomic
> operation.
>
> But I suppose until recently we had a global lock around code generation, and
> this is only slightly worse.  Plus, it has the advantage of being dead simple,
> and without the races vs tb_ctx.tb_flush_count that exist in Yifan's patch.
>
> Applied to tcg-next.
>
>
> r~

[Prev in Thread]

Current Thread

[Next in Thread]

[PATCH] accel/tcg: fix race in cpu_exec_step_atomic (bug 1863025), Alex Bennée, 2020/02/14
- Re: [PATCH] accel/tcg: fix race in cpu_exec_step_atomic (bug 1863025), Paolo Bonzini, 2020/02/14
- Re: [PATCH] accel/tcg: fix race in cpu_exec_step_atomic (bug 1863025), Richard Henderson, 2020/02/14
  - Re: [PATCH] accel/tcg: fix race in cpu_exec_step_atomic (bug 1863025), Yifan Lu <=

Prev by Date: [Bug 1793859] Re: GTK display and mouse input area scaling fails when using vfio-pci device
Next by Date: [PATCH v2 0/3] arm: allwinner: Wire up USB ports
Previous by thread: Re: [PATCH] accel/tcg: fix race in cpu_exec_step_atomic (bug 1863025)
Next by thread: [PATCH v1 0/4] Extension of migration tests
Index(es):
- Date
- Thread