Re: [Qemu-devel] [RFC v2 PATCH 01/13] Introduce TCGOpcode for memory bar

From: Richard Henderson
Subject: Re: [Qemu-devel] [RFC v2 PATCH 01/13] Introduce TCGOpcode for memory barrier
Date: Thu, 2 Jun 2016 14:18:04 -0700
On 06/02/2016 01:38 PM, Sergey Fedorov wrote:
On 02/06/16 23:36, Richard Henderson wrote:
On 06/02/2016 09:30 AM, Sergey Fedorov wrote:
I think we need to extend TCG load/store instruction attributes to
provide information about guest ordering requirements and leave this TCG
operation only for explicit barrier instruction translation.

I do not agree.  I think separate barriers are much cleaner and easier
to manage and reason with.

How are we going to emulate strongly-ordered guests on weakly-ordered
hosts then? I think if every load/store operation must specify which
ordering it implies then this task would be quite simple.

Hum. That does seem helpful-ish. But I'm not certain how helpful it is to complicate the helper functions even further.

What if we have tcg_canonicalize_memop (or some such) split off the barriers into separate opcodes. E.g.

MO_BAR_LD_B = 32        // prevent earlier loads from crossing current op
MO_BAR_ST_B = 64        // prevent earlier stores from crossing current op
MO_BAR_LD_A = 128       // prevent later loads from crossing current op
MO_BAR_ST_A = 256       // prevent later stores from crossing current op

// Match Sparc MEMBAR as the most flexible host.
TCG_BAR_LD_LD = 1       // #LoadLoad barrier
TCG_BAR_ST_LD = 2       // #StoreLoad barrier
TCG_BAR_LD_ST = 4       // #LoadStore barrier
TCG_BAR_ST_ST = 8       // #StoreStore barrier
TCG_BAR_SYNC  = 64      // SEQ_CST barrier


  tcg_gen_qemu_ld_i32(x, y, i, m | MO_BAR_LD_BEFORE | MO_BAR_ST_AFTER)


  mb            TCG_BAR_LD_LD
  qemu_ld_i32   x, y, i, m
  mb            TCG_BAR_LD_ST

We can then add an optimization pass which folds barriers with no memory operations in between, so that duplicates are eliminated.


