qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86: Implement Linear Address Masking support


From: Richard Henderson
Subject: Re: [PATCH] x86: Implement Linear Address Masking support
Date: Thu, 7 Apr 2022 07:28:54 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.7.0

On 4/7/22 06:18, Kirill A. Shutemov wrote:
The new hook is incorrect, in that it doesn't apply to addresses along
the tlb fast path.

I'm not sure what you mean by that. tlb_hit() mechanics works. We strip
the tag bits before tlb lookup.

Could you elaborate?

The fast path does not clear the bits, so you enter the slow path before you get to clearing the bits. You've lost most of the advantage of the tlb already.

To be honest I don't fully understand how TBI emulation works.

In get_phys_addr_lpae:

        addrsize = 64 - 8 * param.tbi;
...
        target_ulong top_bits = sextract64(address, inputsize,
                                           addrsize - inputsize);
        if (-top_bits != param.select) {
            /* The gap between the two regions is a Translation fault */
            fault_type = ARMFault_Translation;
            goto do_fault;
        }

which does not include TBI bits in the validation of the sign-extended address.

Consider store_helper(). I failed to find where tag bits get stripped
before getting there for !CONFIG_USER_ONLY. clean_data_tbi() only covers
user-only case.

And if we get there with tags, I don't see how we will ever get to fast
path: tlb_hit() should never return true there if any bit in top byte is
set as cached tlb_addr has them stripped.

tlb_fill() will get it handled correctly, but it is wasteful to go through
pagewalk on every tagged pointer dereference.

We won't do a pagewalk for every tagged pointer dereference. It'll be pointer dereferences with differing tags past the limit of the victim cache (CPU_VTLB_SIZE). And one tag will get to use the fast path, e.g. on the store following a load.

I've just now had a browse through the Intel docs, and I see that you're not performing the required modified canonicality check. While a proper tagged address will have the tag removed in CR2 during a page fault, an improper tagged address (with bit 63 != {47,56}) should have the original address reported to CR2.

I could imagine a hook that could aid the victim cache in ignoring the tag, so that we need go through tlb_fill fewer times. But I wouldn't want to include that in the base version of this feature, and I'd want take more than a moment in the design so that it could be used by ARM and RISC-V as well.


r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]