[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] accel/tcg: fix race in cpu_exec_step_atomic (bug 1863025)
From: |
Yifan Lu |
Subject: |
Re: [PATCH] accel/tcg: fix race in cpu_exec_step_atomic (bug 1863025) |
Date: |
Fri, 14 Feb 2020 16:01:17 -0800 |
What race are you thinking of in my patch? The obvious race I can
think of is benign:
Case 1:
A: does TB flush
B: read tb_flush_count
A: increment tb_flush_count
A: end_exclusive
B: tb_lookup__cpu_state/tb_gen_code
B: start_exclusive
B: read tb_flush_count again (increment seen)
B: retries
Case 2:
B: read tb_flush_count
A: does TB flush
A: increment tb_flush_count
A: end_exclusive
B: tb_lookup__cpu_state/tb_gen_code
B: start_exclusive
B: read tb_flush_count again (increment seen)
B: retries
Case 3:
A: does TB flush
A: increment tb_flush_count
A: end_exclusive
B: read tb_flush_count
B: tb_lookup__cpu_state/tb_gen_code
B: start_exclusive
B: read tb_flush_count again (no increment seen)
B: proceeds
Case 1 is the expected case. Case 2, we thought TB was stale but it
wasn't so we get it again with tb_lookup__cpu_state with minimal extra
overhead.
Case 3 seems to be bad because we could read tb_flush_count and find
it already incremented. But if so that means thread A is at the end of
do_tb_flush and the lookup tables are already cleared and the TCG
context is already reset. So it should be safe for thread B to call
tb_lookup__cpu_state or tb_gen_code.
Yifan
On Fri, Feb 14, 2020 at 3:31 PM Richard Henderson
<address@hidden> wrote:
>
> On 2/14/20 6:49 AM, Alex Bennée wrote:
> > The bug describes a race whereby cpu_exec_step_atomic can acquire a TB
> > which is invalidated by a tb_flush before we execute it. This doesn't
> > affect the other cpu_exec modes as a tb_flush by it's nature can only
> > occur on a quiescent system. The race was described as:
> >
> > B2. tcg_cpu_exec => cpu_exec => tb_find => tb_gen_code
> > B3. tcg_tb_alloc obtains a new TB
> >
> > C3. TB obtained with tb_lookup__cpu_state or tb_gen_code
> > (same TB as B2)
> >
> > A3. start_exclusive critical section entered
> > A4. do_tb_flush is called, TB memory freed/re-allocated
> > A5. end_exclusive exits critical section
> >
> > B2. tcg_cpu_exec => cpu_exec => tb_find => tb_gen_code
> > B3. tcg_tb_alloc reallocates TB from B2
> >
> > C4. start_exclusive critical section entered
> > C5. cpu_tb_exec executes the TB code that was free in A4
> >
> > The simplest fix is to widen the exclusive period to include the TB
> > lookup. As a result we can drop the complication of checking we are in
> > the exclusive region before we end it.
>
> I'm not 100% keen on having the tb_gen_code within the exclusive region. It
> implies a much larger delay on (at least) the first execution of the atomic
> operation.
>
> But I suppose until recently we had a global lock around code generation, and
> this is only slightly worse. Plus, it has the advantage of being dead simple,
> and without the races vs tb_ctx.tb_flush_count that exist in Yifan's patch.
>
> Applied to tcg-next.
>
>
> r~