qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3] intel-iommu: Document iova_tree


From: Jason Wang
Subject: Re: [PATCH v3] intel-iommu: Document iova_tree
Date: Mon, 9 Jan 2023 17:08:59 +0800

On Wed, Jan 4, 2023 at 11:14 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Wed, Jan 04, 2023 at 12:15:20PM +0800, Jason Wang wrote:
> > On Wed, Jan 4, 2023 at 1:30 AM Peter Xu <peterx@redhat.com> wrote:
> > >
> > > On Mon, Dec 26, 2022 at 12:09:52PM +0800, Jason Wang wrote:
> > > > On Sat, Dec 24, 2022 at 12:26 AM Peter Xu <peterx@redhat.com> wrote:
> > > > >
> > > > > On Fri, Dec 23, 2022 at 03:48:01PM +0800, Jason Wang wrote:
> > > > > > On Wed, Dec 7, 2022 at 6:13 AM Peter Xu <peterx@redhat.com> wrote:
> > > > > > >
> > > > > > > It seems not super clear on when iova_tree is used, and why.  Add 
> > > > > > > a rich
> > > > > > > comment above iova_tree to track why we needed the iova_tree, and 
> > > > > > > when we
> > > > > > > need it.
> > > > > > >
> > > > > > > Also comment for the map/unmap messages, on how they're used and
> > > > > > > implications (e.g. unmap can be larger than the mapped ranges).
> > > > > > >
> > > > > > > Suggested-by: Jason Wang <jasowang@redhat.com>
> > > > > > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > > > > > ---
> > > > > > > v3:
> > > > > > > - Adjust according to Eric's comment
> > > > > > > ---
> > > > > > >  include/exec/memory.h         | 28 ++++++++++++++++++++++++++
> > > > > > >  include/hw/i386/intel_iommu.h | 38 
> > > > > > > ++++++++++++++++++++++++++++++++++-
> > > > > > >  2 files changed, 65 insertions(+), 1 deletion(-)
> > > > > > >
> > > > > > > diff --git a/include/exec/memory.h b/include/exec/memory.h
> > > > > > > index 91f8a2395a..269ecb873b 100644
> > > > > > > --- a/include/exec/memory.h
> > > > > > > +++ b/include/exec/memory.h
> > > > > > > @@ -129,6 +129,34 @@ struct IOMMUTLBEntry {
> > > > > > >  /*
> > > > > > >   * Bitmap for different IOMMUNotifier capabilities. Each 
> > > > > > > notifier can
> > > > > > >   * register with one or multiple IOMMU Notifier capability 
> > > > > > > bit(s).
> > > > > > > + *
> > > > > > > + * Normally there're two use cases for the notifiers:
> > > > > > > + *
> > > > > > > + *   (1) When the device needs accurate synchronizations of the 
> > > > > > > vIOMMU page
> > > > > > > + *       tables, it needs to register with both MAP|UNMAP 
> > > > > > > notifies (which
> > > > > > > + *       is defined as IOMMU_NOTIFIER_IOTLB_EVENTS below).
> > > > > > > + *
> > > > > > > + *       Regarding to accurate synchronization, it's when the 
> > > > > > > notified
> > > > > > > + *       device maintains a shadow page table and must be 
> > > > > > > notified on each
> > > > > > > + *       guest MAP (page table entry creation) and UNMAP 
> > > > > > > (invalidation)
> > > > > > > + *       events (e.g. VFIO). Both notifications must be accurate 
> > > > > > > so that
> > > > > > > + *       the shadow page table is fully in sync with the guest 
> > > > > > > view.
> > > > > > > + *
> > > > > > > + *   (2) When the device doesn't need accurate synchronizations 
> > > > > > > of the
> > > > > > > + *       vIOMMU page tables, it needs to register only with 
> > > > > > > UNMAP or
> > > > > > > + *       DEVIOTLB_UNMAP notifies.
> > > > > > > + *
> > > > > > > + *       It's when the device maintains a cache of IOMMU 
> > > > > > > translations
> > > > > > > + *       (IOTLB) and is able to fill that cache by requesting 
> > > > > > > translations
> > > > > > > + *       from the vIOMMU through a protocol similar to ATS 
> > > > > > > (Address
> > > > > > > + *       Translation Service).
> > > > > > > + *
> > > > > > > + *       Note that in this mode the vIOMMU will not maintain a 
> > > > > > > shadowed
> > > > > > > + *       page table for the address space, and the UNMAP 
> > > > > > > messages can be
> > > > > > > + *       actually larger than the real invalidations (just like 
> > > > > > > how the
> > > > > > > + *       Linux IOMMU driver normally works, where an 
> > > > > > > invalidation can be
> > > > > > > + *       enlarged as long as it still covers the target range).  
> > > > > > > The IOMMU
> > > > > >
> > > > > > Just spot this when testing your fix for DSI:
> > > > > >
> > > > > >         assert(entry->iova >= notifier->start && entry_end <= 
> > > > > > notifier->end);
> > > > > >
> > > > > > Do we need to remove this (but it seems a partial revert of
> > > > > > 03c7140c1a0336af3d4fca768de791b9c0e2b128)?
> > > > >
> > > > > Replied in the othe thread.
> > > > >
> > > > > I assume this documentation patch is still correct, am I right?  It's
> > > > > talking about the possibility of enlarged invalidation range sent 
> > > > > from the
> > > > > guest and vIOMMU.  That should still not be bigger than the registered
> > > > > range in iommu notifiers (even if bigger than the actual unmapped 
> > > > > range).
> > > >
> > > > Adding Eugenio.
> > > >
> > > > So I think we need to evaluate the possible side effects to all the
> > > > current nmap notifiers. For example the vfio_iommu_map_notify().
> > > >
> > > > And in another thread, if we crop the size, it basically means the
> > > > notifier itself will still assume the range is valid, which is not
> > > > what is documented in this patch.
> > > >
> > > > What's more interesting I see smmu had:
> > > >
> > > > /* Unmap the whole notifier's range */
> > > > static void smmu_unmap_notifier_range(IOMMUNotifier *n)
> > > > {
> > > >     IOMMUTLBEvent event;
> > > >
> > > >     event.type = IOMMU_NOTIFIER_UNMAP;
> > > >     event.entry.target_as = &address_space_memory;
> > > >     event.entry.iova = n->start;
> > > >     event.entry.perm = IOMMU_NONE;
> > > >     event.entry.addr_mask = n->end - n->start;
> > > >
> > > >     memory_region_notify_iommu_one(n, &event);
> > > > }
> > > >
> > > > So it looks to me it's more safe to do something similar for vtd first.
> > >
> > > Jason, could you elaborate more on this one?
> >
> > I meant it's more safe to have a vtd version:
> >
> >  > > static void vtd_unmap_notifier_range(IOMMUNotifier *n)
> > > > {
> > > >     IOMMUTLBEvent event;
> > > >
> > > >     event.type = IOMMU_NOTIFIER_UNMAP;
> > > >     event.entry.target_as = &address_space_memory;
> > > >     event.entry.iova = n->start;
> > > >     event.entry.perm = IOMMU_NONE;
> > > >     event.entry.addr_mask = n->end - n->start;
> > > >
> > > >     memory_region_notify_iommu_one(n, &event);
> >
> > Or move it to the memory.c.
>
> I see.
>
> If we always do the crop in memory_region_notify_iommu_one() it'll have
> similar effect of having above helper, am I right?

Right, but we may end up twice cropping. I don't have an actual
preference, but I think we need to be consistent here.

Either:

1) cropping in the memory core and remove the iommu cropping like
smmu_unmap_notifier_range()

or

2) don't corp in the memory core but move smmu_unmap_notifier_range to
the core (still, a kind of implicit crop, since the function was
called without a range)

2) seems safer but I can go with 1 if you insist.

>
> I checked again on the VFIO code path in kernel, it (at least type1v2)
> doesn't allow unmapping of partial mapped range, but it looks always fine
> to have unmap covering not-mapped spaces.

That's my understanding as well.

>
> One more thing I noticed is there's a new flag introduced in 2021 for vfio
> to unmap the whole address space (VFIO_DMA_UNMAP_FLAG_ALL).  In the future
> we can leverage this when we want to do DSI more efficiently, but not
> immediately necessary - I think that needs a new IOMMU notifier API hook.
>

Yes.

> And if you see the impl of that new flag (in vfio_dma_do_unmap) it also
> shows that a larger range of unmap is fine to vfio, because for unmap_all
> it's the same as specifying the size to be max:
>
>         if (unmap_all) {
>                 if (iova || size)
>                         goto unlock;
>                 size = U64_MAX;
>         }
>
> >
> > >
> > > Meanwhile, I don't immediately see what's the side effect you mentioned 
> > > for
> > > vfio map events.
> >
> > I don't see but it looks more safe. Do you know the reason why SMMU
> > doesn't simply do a [0, ULONG_MAX] unmap notify? (Maybe Eric know)
>
> Same here..
>
> >
> > > I thought any map event should always be in the notifier
> > > range anyway because map event only comes in page sizes and generated by
> > > vt-d page walkers (not guest driver, which is IIUC the only place where 
> > > the
> > > range of invalidation can be enlarged).  So I don't expect any functional
> > > change to map events if we decide to crop the ranges unconditionally.
> >
> > If we crop the ranges, the above description:
> >
> > """
> > and the UNMAP messages can be actually larger than the real invalidations.
> > """
> >
> > doesn't apply anymore.
>
> It depends on how to define the "real invalidations".  There're two places
> that can enlarge an invalidation, here I wanted to reference the case where
> e.g. a PSI is enlarged to a DSI.  Even if that's the driver behavior, I
> wanted to make sure the qemu iommu notifiees are aware of the facts that
> unmap can be bigger than what it used to have mapped.

Ok, I think the confusion came from "real invalidations". I think
there's no way for the device to know about the real invalidation
since the driver can enlarge it at will? If this is true, is this
better to say the UNAMP messages can cover the range that is not
mapped?

Thanks

>
> Thanks,
>
> >
> > Thanks
> >
> > >  Did I miss anything?
> > >
> > > Thanks,
> > >
> > > >
> > > > Btw, I forgot the reason why we need to crop the size in the case of
> > > > device IOTLB, Eguenio do you know that?
> > > >
> > > > Thanks
> > > >
> > > > >
> > > > > Thanks,
> > > > >
> > > > > --
> > > > > Peter Xu
> > > > >
> > > >
> > >
> > > --
> > > Peter Xu
> > >
> >
>
> --
> Peter Xu
>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]