Re: [qemu-s390x] [Qemu-devel] [PATCH v1 2/4] s390x/tcg: Introduce probe

qemu-s390x

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [qemu-s390x] [Qemu-devel] [PATCH v1 2/4] s390x/tcg: Introduce probe_

From:	David Hildenbrand
Subject:	Re: [qemu-s390x] [Qemu-devel] [PATCH v1 2/4] s390x/tcg: Introduce probe_read_access()
Date:	Mon, 26 Aug 2019 11:31:32 +0200
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0

On 22.08.19 09:01, David Hildenbrand wrote:
> On 22.08.19 00:31, Richard Henderson wrote:
>> On 8/21/19 2:33 PM, David Hildenbrand wrote:
>>>> NOTDIRTY cannot fault at all.  The associated rcu critical section is ugly
>>>> enough to make me not want to do anything except continue to go through the
>>>> regular MMIO path.
>>>>
>>>> In any case, so long as we eliminate *access* faults by probing the page 
>>>> table,
>>>> then falling back to the byte-by-byte loop is, AFAICS, sufficient to 
>>>> implement
>>>> the instructions correctly.
>>>
>>> "In any case, so long as we eliminate *access* faults by probing the
>>> page table" - that's what I'm doing in this patch (and even more correct
>>> in the prototype patch I shared), no? (besides the watchpoint madness below)
>>
>> Correct.
>>
>> My main objection to your current patch is that you perform the access checks
>> within MVC, and then do some more tlb lookups in fast_memmove.
> 
> Right, however I think splitting up both steps is nicer (especially if
> we mix and match memmove and memset in MVCLE later). Maybe combining a
> tuned-up probe_write() with something similar (but different to)
> access_prepare().
> 
> General changes:
> 
> 1. Check watchpoints in probe_write()
> 
> 2. Make probe_write() bail out if it would cross pages() - caller has to
> assure it falls into a single page (TARGET_PAGE_SIZE ?).
> 
> 3. Return a pointer from probe_write()
> -> If NULL is returned, reads/writes have to go via memops, e.g., single
> byte access
> 
> 4. Make probe_write() return addresses of TLB entries that are already
> invalid again (-> LAP)
> 
> 
> What I propose for s390x:
> 
> 1. access_prepare(): Collect the pointers using probe_write() - maximum
> is two pages. If we get NULL, there is no fault but we have to fallback
> to read/write using memops.
> 
> 2. access_memset() / access_set() / access_move() ... pass the access
> information we collected. Prototype I shared can be heavily simplified -
> don't fall back to paddrs but instead do single-byte access, don't allow
> flexible array of pages, etc.
> 
> 3. First convert MVC to the new access mechanism, later fix + convert
> the others.
> 
>>
>> I think that fast_memmove is where the access checks should live.  That 
>> allows
>> incremental improvement to combine access checks + host address lookup, which
>> cannot currently be done in one step with existing interfaces.
>>
>> I guess you would still want access checks within MVC for the case in which 
>> you
>> must fall back to byte-by-byte because of destructive overlap.
> 
> That's why I think introducing a new set of helpers makes sense. Once
> all users of fast_memmove() etc are gone, we can then throw them away.
> 
>>
>>> "falling back to the byte-by-byte loop is, AFAICS, sufficient"
>>>
>>> I don't think this is sufficient. E.g., LAP protected pages
>>> (PAGE_WRITE_INV which immediately requires a new MMU walk on the next
>>> access) will trigger a new MMU walk on every byte access (that's why I
>>> chose to pre-translate in my prototype).
>>
>> LAP protected pages is exactly why probe_write should return the host 
>> address,
>> so that we can do the access check + host address lookup in one step.
>>
> 
> Yes, this should also be done in tlb_vaddr_to_host() - but as we'll be
> converting to a probe_write() thingy, it might not be required to handle
> the special "TLB_INVALID_MASK set after tlb_fill()" case.
> 
>> But in the meantime...
>>
>>> If another CPU modified the
>>> page tables in between, we could suddenly get a fault - although we
>>> checked early. What am I missing?
>>
>> You're concerned with a bare write to the page table by cpu B, while cpu A is
>> executing, and before cpu B issues the cross-cpu tlb flush?
> 
> Not bare writes, these are invalid. Take IPTE for example. Set the
> invalid bit, then do a sync CPU flush.
> 
> If we re-translate a vaddr after probe_write() - which would currently
> happen in case of LAP - then we could suddenly run into the invalid bit,
> triggering a fault. The sync CPU flush still wasn't processed, but we
> see the "invalid" bit early because we decided to re-translate after
> already modifying memory. This re-translation really has to be avoided.

FWIW, real HW has a dedicated lock for that as far as I remember, the
IPTE lock. While holding the IPTE lock, no modifications of page tables
are possible.

So e.g., IPTE/IDTE/CSP take the lock while working one the page tables.
MMU walkers in return take the lock while translating vaddrs.
Especially, in KVM guest access code we hold the same HW lock to make
sure no IPTE/IDTE can modify the page tables while we walk them.
Something like that (reader/writer lock) would also be possible in our MMU.

-- 

Thanks,

David / dhildenb

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [qemu-s390x] [Qemu-devel] [PATCH v1 2/4] s390x/tcg: Introduce probe_read_access(), (continued)
- [qemu-s390x] [PATCH v1 3/4] s390x/tcg: MOVE (MVC): Increment the length once, David Hildenbrand, 2019/08/21
  - Re: [qemu-s390x] [Qemu-devel] [PATCH v1 3/4] s390x/tcg: MOVE (MVC): Increment the length once, Richard Henderson, 2019/08/21
- [qemu-s390x] [PATCH v1 4/4] s390x/tcg: MOVE (MVC): Fault-safe handling, David Hildenbrand, 2019/08/21

Prev by Date: [qemu-s390x] [PATCH v2 7/7] s390x/tcg: Pass a size to probe_write() in do_csst()
Next by Date: Re: [qemu-s390x] [Qemu-devel] [PATCH v2 3/7] tcg: Factor out CONFIG_USER_ONLY probe_write() from s390x code
Previous by thread: Re: [qemu-s390x] [Qemu-devel] [PATCH v1 2/4] s390x/tcg: Introduce probe_read_access()
Next by thread: Re: [qemu-s390x] [Qemu-devel] [PATCH v1 2/4] s390x/tcg: Introduce probe_read_access()
Index(es):
- Date
- Thread