[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [RFC/RFT PATCH v2 3/3] arm/arm64: KVM: implement 'uncac
From: |
Christoffer Dall |
Subject: |
Re: [Qemu-devel] [RFC/RFT PATCH v2 3/3] arm/arm64: KVM: implement 'uncached' mem coherency |
Date: |
Fri, 15 May 2015 08:02:59 -0700 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
On Thu, May 14, 2015 at 03:32:13PM +0200, Andrew Jones wrote:
> On Thu, May 14, 2015 at 12:55:49PM +0200, Christoffer Dall wrote:
> > On Wed, May 13, 2015 at 01:31:54PM +0200, Andrew Jones wrote:
> > > When S1 and S2 memory attributes combine wrt to caching policy,
> > > non-cacheable types take precedence. If a guest maps a region as
> > > device memory, which KVM userspace is using to emulate the device
> > > using normal, cacheable memory, then we lose coherency. With
> > > KVM_MEM_UNCACHED, KVM userspace can now hint to KVM which memory
> > > regions are likely to be problematic. With this patch, as pages
> > > of these types of regions are faulted into the guest, not only do
> > > we flush the page's dcache, but we also change userspace's
> > > mapping to NC in order to maintain coherency.
> > >
> > > What if the guest doesn't do what we expect? While we can't
> > > force a guest to use cacheable memory, we can take advantage of
> > > the non-cacheable precedence, and force it to use non-cacheable.
> > > So, this patch also introduces PAGE_S2_NORMAL_NC, and uses it on
> > > KVM_MEM_UNCACHED regions to force them to NC.
> > >
> > > We now have both guest and userspace on the same page (pun intended)
> >
> > I'd like to revisit the overall approach here. Is doing non-cached
> > accesses in both the guest and host really the right thing to do here?
>
> I think so, but all ideas/approaches are still on the table. This is
> still an RFC.
>
> >
> > The semantics of the device becomes that it is cache coherent (because
> > QEMU is), and I think Marc argued that Linux/UEFI should simply be
> > adapted to handle whatever emulated devices we have as coherent. I also
> > remember someone arguing that would be wrong (Peter?).
>
> I'm not really for quirking all devices in all guest types (AAVMF, Linux,
> other bootloaders, other OSes). Windows is unlikely to apply any quirks.
>
Well my point was that if we're emulating a platform with coherent IO
memory for PCI devices that is something that the guest should work with
as such, but as Paolo explained it should always be safe for a guest to
assume non-coherent, so that doesn't work.
> >
> > Finally, does this address all cache coherency issues with emulated
> > devices? Some VOS guys had seen things still not working with this
> > approach, unsure why... I'd like to avoid us merging this only to merge
> > a more complete solution in a few weeks which reverts this solution...
>
> I'm not sure (this is still an RFT too :-) We definitely would need to
> scatter some more memory_region_set_uncached() calls around QEMU first.
>
It would be good if you could sync with the VOS guys and make sure your
patch set addresses their issues with the appropriate
memory_region_set_uncached() added to QEMU, and if it does not, some
vague idea why that falls outside of the scope of this patch set. After
all, adding a USB controller to a VM is not that an esoteric use case,
is it?
> >
> > More comments/questions below:
> >
> > >
> > > Signed-off-by: Andrew Jones <address@hidden>
> > > ---
> > > arch/arm/include/asm/kvm_mmu.h | 5 ++++-
> > > arch/arm/include/asm/pgtable-3level.h | 1 +
> > > arch/arm/include/asm/pgtable.h | 1 +
> > > arch/arm/kvm/mmu.c | 37
> > > +++++++++++++++++++++++------------
> > > arch/arm64/include/asm/kvm_mmu.h | 5 ++++-
> > > arch/arm64/include/asm/memory.h | 1 +
> > > arch/arm64/include/asm/pgtable.h | 1 +
> > > 7 files changed, 36 insertions(+), 15 deletions(-)
> > >
> > > diff --git a/arch/arm/include/asm/kvm_mmu.h
> > > b/arch/arm/include/asm/kvm_mmu.h
> > > index 405aa18833073..e8034a80b12e5 100644
> > > --- a/arch/arm/include/asm/kvm_mmu.h
> > > +++ b/arch/arm/include/asm/kvm_mmu.h
> > > @@ -214,8 +214,11 @@ static inline void
> > > __coherent_cache_guest_page(struct kvm_vcpu *vcpu, pfn_t pfn,
> > > while (size) {
> > > void *va = kmap_atomic_pfn(pfn);
> > >
> > > - if (need_flush)
> > > + if (need_flush) {
> > > kvm_flush_dcache_to_poc(va, PAGE_SIZE);
> > > + if (ipa_uncached)
> > > + set_memory_nc((unsigned long)va, 1);
> >
> > nit: consider moving this outside the need_flush
> >
> > > + }
> > >
> > > if (icache_is_pipt())
> > > __cpuc_coherent_user_range((unsigned long)va,
> > > diff --git a/arch/arm/include/asm/pgtable-3level.h
> > > b/arch/arm/include/asm/pgtable-3level.h
> > > index a745a2a53853c..39b3f7a40e663 100644
> > > --- a/arch/arm/include/asm/pgtable-3level.h
> > > +++ b/arch/arm/include/asm/pgtable-3level.h
> > > @@ -121,6 +121,7 @@
> > > * 2nd stage PTE definitions for LPAE.
> > > */
> > > #define L_PTE_S2_MT_UNCACHED (_AT(pteval_t, 0x0) << 2) /*
> > > strongly ordered */
> > > +#define L_PTE_S2_MT_NORMAL_NC (_AT(pteval_t, 0x5) << 2) /*
> > > normal non-cacheable */
> > > #define L_PTE_S2_MT_WRITETHROUGH (_AT(pteval_t, 0xa) << 2) /* normal
> > > inner write-through */
> > > #define L_PTE_S2_MT_WRITEBACK (_AT(pteval_t, 0xf) << 2) /*
> > > normal inner write-back */
> > > #define L_PTE_S2_MT_DEV_SHARED (_AT(pteval_t, 0x1) << 2) /*
> > > device */
> > > diff --git a/arch/arm/include/asm/pgtable.h
> > > b/arch/arm/include/asm/pgtable.h
> > > index f40354198bad4..ae13ca8b0a23d 100644
> > > --- a/arch/arm/include/asm/pgtable.h
> > > +++ b/arch/arm/include/asm/pgtable.h
> > > @@ -100,6 +100,7 @@ extern pgprot_t pgprot_s2_device;
> > > #define PAGE_HYP _MOD_PROT(pgprot_kernel, L_PTE_HYP)
> > > #define PAGE_HYP_DEVICE _MOD_PROT(pgprot_hyp_device, L_PTE_HYP)
> > > #define PAGE_S2 _MOD_PROT(pgprot_s2, L_PTE_S2_RDONLY)
> > > +#define PAGE_S2_NORMAL_NC __pgprot((pgprot_val(PAGE_S2) &
> > > ~L_PTE_S2_MT_MASK) | L_PTE_S2_MT_NORMAL_NC)
> > > #define PAGE_S2_DEVICE _MOD_PROT(pgprot_s2_device,
> > > L_PTE_S2_RDONLY)
> > >
> > > #define __PAGE_NONE __pgprot(_L_PTE_DEFAULT | L_PTE_RDONLY
> > > | L_PTE_XN | L_PTE_NONE)
> > > diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> > > index bc1665acd73e7..6b3bd8061bd2a 100644
> > > --- a/arch/arm/kvm/mmu.c
> > > +++ b/arch/arm/kvm/mmu.c
> > > @@ -1220,7 +1220,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu,
> > > phys_addr_t fault_ipa,
> > > struct vm_area_struct *vma;
> > > pfn_t pfn;
> > > pgprot_t mem_type = PAGE_S2;
> > > - bool fault_ipa_uncached;
> > > + bool fault_ipa_uncached = false;
> > > bool logging_active = memslot_is_logging(memslot);
> > > unsigned long flags = 0;
> > >
> > > @@ -1300,6 +1300,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu,
> > > phys_addr_t fault_ipa,
> > > writable = false;
> > > }
> > >
> > > + if (memslot->flags & KVM_MEM_UNCACHED) {
> > > + mem_type = PAGE_S2_NORMAL_NC;
> > > + fault_ipa_uncached = true;
> > > + }
> > > +
> > > spin_lock(&kvm->mmu_lock);
> > > if (mmu_notifier_retry(kvm, mmu_seq))
> > > goto out_unlock;
> > > @@ -1307,8 +1312,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu,
> > > phys_addr_t fault_ipa,
> > > if (!hugetlb && !force_pte)
> > > hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa);
> > >
> > > - fault_ipa_uncached = memslot->flags & KVM_MEM_UNCACHED;
> > > -
> > > if (hugetlb) {
> > > pmd_t new_pmd = pfn_pmd(pfn, mem_type);
> > > new_pmd = pmd_mkhuge(new_pmd);
> > > @@ -1462,6 +1465,7 @@ static int handle_hva_to_gpa(struct kvm *kvm,
> > > unsigned long start,
> > > unsigned long end,
> > > int (*handler)(struct kvm *kvm,
> > > + struct kvm_memory_slot *slot,
> > > gpa_t gpa, void *data),
> > > void *data)
> > > {
> > > @@ -1491,14 +1495,15 @@ static int handle_hva_to_gpa(struct kvm *kvm,
> > >
> > > for (; gfn < gfn_end; ++gfn) {
> > > gpa_t gpa = gfn << PAGE_SHIFT;
> > > - ret |= handler(kvm, gpa, data);
> > > + ret |= handler(kvm, memslot, gpa, data);
> > > }
> > > }
> > >
> > > return ret;
> > > }
> > >
> > > -static int kvm_unmap_hva_handler(struct kvm *kvm, gpa_t gpa, void *data)
> > > +static int kvm_unmap_hva_handler(struct kvm *kvm, struct kvm_memory_slot
> > > *slot,
> > > + gpa_t gpa, void *data)
> >
> > Maybe we should consider a pointer to a struct with the relevant data to
> > pass around to the handler by now, which would allow us to get rid of
> > the void * cast as well. Not sure if it's worth it.
> >
> > > {
> > > unmap_stage2_range(kvm, gpa, PAGE_SIZE);
> > > return 0;
> > > @@ -1527,9 +1532,15 @@ int kvm_unmap_hva_range(struct kvm *kvm,
> > > return 0;
> > > }
> > >
> > > -static int kvm_set_spte_handler(struct kvm *kvm, gpa_t gpa, void *data)
> > > +static int kvm_set_spte_handler(struct kvm *kvm, struct kvm_memory_slot
> > > *slot,
> > > + gpa_t gpa, void *data)
> > > {
> > > - pte_t *pte = (pte_t *)data;
> > > + pte_t pte = *((pte_t *)data);
> > > +
> > > + if (slot->flags & KVM_MEM_UNCACHED)
> > > + pte = pfn_pte(pte_pfn(pte), PAGE_S2_NORMAL_NC);
> > > + else
> > > + pte = pfn_pte(pte_pfn(pte), PAGE_S2);
> > >
> > > /*
> > > * We can always call stage2_set_pte with KVM_S2PTE_FLAG_LOGGING_ACTIVE
> > > @@ -1538,7 +1549,7 @@ static int kvm_set_spte_handler(struct kvm *kvm,
> > > gpa_t gpa, void *data)
> > > * therefore stage2_set_pte() never needs to clear out a huge PMD
> > > * through this calling path.
> > > */
> > > - stage2_set_pte(kvm, NULL, gpa, pte, 0);
> > > + stage2_set_pte(kvm, NULL, gpa, &pte, 0);
> >
> > this is making me feel like we should have a separate patch that changes
> > stage2_set_pte from taking a pointer to just taking a value for the new
> > pte.
> >
> > > return 0;
> > > }
> > >
> > > @@ -1546,17 +1557,16 @@ static int kvm_set_spte_handler(struct kvm *kvm,
> > > gpa_t gpa, void *data)
> > > void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
> > > {
> > > unsigned long end = hva + PAGE_SIZE;
> > > - pte_t stage2_pte;
> > >
> > > if (!kvm->arch.pgd)
> > > return;
> > >
> > > trace_kvm_set_spte_hva(hva);
> > > - stage2_pte = pfn_pte(pte_pfn(pte), PAGE_S2);
> > > - handle_hva_to_gpa(kvm, hva, end, &kvm_set_spte_handler, &stage2_pte);
> > > + handle_hva_to_gpa(kvm, hva, end, &kvm_set_spte_handler, &pte);
> >
> > hooking in here will make sure you catch changes to the page used for
> > the mapping, but wouldn't that also mean that the userspace mapping
> > would have been change, and where are you updating this?
> >
> > Also, is this called if the userspace mapping is zapped without doing
> > anything about the underlying page? (how do we then catch when the
> > userspace pte is populated again, and is this even possible?)
>
> I was hoping that I only needed to worry about getting the S2 attributes
> right here, and then, since the page will need to be refaulted into
> the guest anyway, that the userspace part would get taken care of at
> that point (in user_mem_abort).
user_mem_abort handles stage-2 page table faults, which has (almost)
nothing to do with the user space page tables.
I think it's entirely possible to have a stage-2 mapping to a page,
which is no longer mapped to userspace at all. Or do we pin the
userspace PTEs (not just keep a reference on the physical page) during
fault handling?
> But, to be honest, I forgot to dig into
> this deep enough to know if my hope will work or not.
>
You really need to work this out so you feel confident with the overall
scheme here, then I can try to see if I can break it under review, but I
think the author of this patch must know how it is *supposed* to work ;)
> >
> > > }
> > >
> > > -static int kvm_age_hva_handler(struct kvm *kvm, gpa_t gpa, void *data)
> > > +static int kvm_age_hva_handler(struct kvm *kvm, struct kvm_memory_slot
> > > *slot,
> > > + gpa_t gpa, void *data)
> > > {
> > > pmd_t *pmd;
> > > pte_t *pte;
> > > @@ -1586,7 +1596,8 @@ static int kvm_age_hva_handler(struct kvm *kvm,
> > > gpa_t gpa, void *data)
> > > return 0;
> > > }
> > >
> > > -static int kvm_test_age_hva_handler(struct kvm *kvm, gpa_t gpa, void
> > > *data)
> > > +static int kvm_test_age_hva_handler(struct kvm *kvm, struct
> > > kvm_memory_slot *slot,
> > > + gpa_t gpa, void *data)
> > > {
> > > pmd_t *pmd;
> > > pte_t *pte;
> > > diff --git a/arch/arm64/include/asm/kvm_mmu.h
> > > b/arch/arm64/include/asm/kvm_mmu.h
> > > index 61505676d0853..af5f0f0eccef9 100644
> > > --- a/arch/arm64/include/asm/kvm_mmu.h
> > > +++ b/arch/arm64/include/asm/kvm_mmu.h
> > > @@ -236,8 +236,11 @@ static inline void
> > > __coherent_cache_guest_page(struct kvm_vcpu *vcpu, pfn_t pfn,
> > > {
> > > void *va = page_address(pfn_to_page(pfn));
> > >
> > > - if (!vcpu_has_cache_enabled(vcpu) || ipa_uncached)
> > > + if (!vcpu_has_cache_enabled(vcpu) || ipa_uncached) {
> > > kvm_flush_dcache_to_poc(va, size);
> > > + if (ipa_uncached)
> > > + set_memory_nc((unsigned long)va, size/PAGE_SIZE);
> >
> > are you not setting the kernel mapping of the page to non-cached here,
> > which doesn't affect your userspace mappings at all?
>
> Oh crap. I shouldn't have tried to use change_memory_common... I
> completely overlooked the fact I'm now using the wrong mm...
>
and the wrong va...
> >
> > (this would explain why things still break with this series).
>
> yeah, I wonder why it works so well?
>
luck/slowed things down/reordered operations to make things less likely
would be my guess.
Thanks,
-Christoffer
- Re: [Qemu-devel] [RFC/RFT PATCH v2 1/3] arm/arm64: pageattr: add set_memory_nc, (continued)
- Re: [Qemu-devel] [RFC/RFT PATCH v2 1/3] arm/arm64: pageattr: add set_memory_nc, Christoffer Dall, 2015/05/15
- Re: [Qemu-devel] [RFC/RFT PATCH v2 1/3] arm/arm64: pageattr: add set_memory_nc, Catalin Marinas, 2015/05/18
- Re: [Qemu-devel] [RFC/RFT PATCH v2 1/3] arm/arm64: pageattr: add set_memory_nc, Andrew Jones, 2015/05/19
- Re: [Qemu-devel] [RFC/RFT PATCH v2 1/3] arm/arm64: pageattr: add set_memory_nc, Catalin Marinas, 2015/05/19
- Re: [Qemu-devel] [RFC/RFT PATCH v2 1/3] arm/arm64: pageattr: add set_memory_nc, Andrew Jones, 2015/05/19
- Re: [Qemu-devel] [RFC/RFT PATCH v2 1/3] arm/arm64: pageattr: add set_memory_nc, Christoffer Dall, 2015/05/20
- Re: [Qemu-devel] [RFC/RFT PATCH v2 1/3] arm/arm64: pageattr: add set_memory_nc, Catalin Marinas, 2015/05/20
[Qemu-devel] [RFC/RFT PATCH v2 3/3] arm/arm64: KVM: implement 'uncached' mem coherency, Andrew Jones, 2015/05/13
- Re: [Qemu-devel] [RFC/RFT PATCH v2 3/3] arm/arm64: KVM: implement 'uncached' mem coherency, Christoffer Dall, 2015/05/14
- Re: [Qemu-devel] [RFC/RFT PATCH v2 3/3] arm/arm64: KVM: implement 'uncached' mem coherency, Andrew Jones, 2015/05/14
- Re: [Qemu-devel] [RFC/RFT PATCH v2 3/3] arm/arm64: KVM: implement 'uncached' mem coherency,
Christoffer Dall <=
- Re: [Qemu-devel] [RFC/RFT PATCH v2 3/3] arm/arm64: KVM: implement 'uncached' mem coherency, Andrew Jones, 2015/05/15
- Re: [Qemu-devel] [RFC/RFT PATCH v2 3/3] arm/arm64: KVM: implement 'uncached' mem coherency, Jérémy Fanguède, 2015/05/15
- Re: [Qemu-devel] [RFC/RFT PATCH v2 3/3] arm/arm64: KVM: implement 'uncached' mem coherency, Mario Smarduch, 2015/05/20
- Re: [Qemu-devel] [RFC/RFT PATCH v2 3/3] arm/arm64: KVM: implement 'uncached' mem coherency, Andrew Jones, 2015/05/21
[Qemu-devel] [RFC/RFT PATCH v2 2/3] KVM: promote KVM_MEMSLOT_INCOHERENT to uapi, Andrew Jones, 2015/05/13
Re: [Qemu-devel] [RFC/RFT PATCH v2 0/3] KVM: Introduce KVM_MEM_UNCACHED, Christoffer Dall, 2015/05/14