qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 08/10] intel-iommu: maintain per-device iova ran


From: Tian, Kevin
Subject: Re: [Qemu-devel] [PATCH 08/10] intel-iommu: maintain per-device iova ranges
Date: Fri, 27 Apr 2018 07:44:07 +0000

> From: Peter Xu [mailto:address@hidden
> Sent: Friday, April 27, 2018 3:28 PM
> 
> On Fri, Apr 27, 2018 at 07:02:14AM +0000, Tian, Kevin wrote:
> > > From: Jason Wang [mailto:address@hidden
> > > Sent: Friday, April 27, 2018 2:08 PM
> > >
> > > On 2018年04月25日 12:51, Peter Xu wrote:
> > > > For each VTDAddressSpace, now we maintain what IOVA ranges we
> have
> > > > mapped and what we have not.  With that information, now we only
> > > send
> > > > MAP or UNMAP when necessary.  Say, we don't send MAP notifies if
> we
> > > know
> > > > we have already mapped the range, meanwhile we don't send
> UNMAP
> > > notifies
> > > > if we know we never mapped the range at all.
> > > >
> > > > Signed-off-by: Peter Xu <address@hidden>
> > > > ---
> > > >   include/hw/i386/intel_iommu.h |  2 ++
> > > >   hw/i386/intel_iommu.c         | 28 ++++++++++++++++++++++++++++
> > > >   hw/i386/trace-events          |  2 ++
> > > >   3 files changed, 32 insertions(+)
> > > >
> > > > diff --git a/include/hw/i386/intel_iommu.h
> > > b/include/hw/i386/intel_iommu.h
> > > > index 486e205e79..09a2e94404 100644
> > > > --- a/include/hw/i386/intel_iommu.h
> > > > +++ b/include/hw/i386/intel_iommu.h
> > > > @@ -27,6 +27,7 @@
> > > >   #include "hw/i386/ioapic.h"
> > > >   #include "hw/pci/msi.h"
> > > >   #include "hw/sysbus.h"
> > > > +#include "qemu/interval-tree.h"
> > > >
> > > >   #define TYPE_INTEL_IOMMU_DEVICE "intel-iommu"
> > > >   #define INTEL_IOMMU_DEVICE(obj) \
> > > > @@ -95,6 +96,7 @@ struct VTDAddressSpace {
> > > >       QLIST_ENTRY(VTDAddressSpace) next;
> > > >       /* Superset of notifier flags that this address space has */
> > > >       IOMMUNotifierFlag notifier_flags;
> > > > +    ITTree *iova_tree;          /* Traces mapped IOVA ranges */
> > > >   };
> > > >
> > > >   struct VTDBus {
> > > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > > > index a19c18b8d4..8f396a5d13 100644
> > > > --- a/hw/i386/intel_iommu.c
> > > > +++ b/hw/i386/intel_iommu.c
> > > > @@ -768,12 +768,37 @@ typedef struct {
> > > >   static int vtd_page_walk_one(IOMMUTLBEntry *entry, int level,
> > > >                                vtd_page_walk_info *info)
> > > >   {
> > > > +    VTDAddressSpace *as = info->as;
> > > >       vtd_page_walk_hook hook_fn = info->hook_fn;
> > > >       void *private = info->private;
> > > > +    ITRange *mapped = it_tree_find(as->iova_tree, entry->iova,
> > > > +                                   entry->iova + entry->addr_mask);
> > > >
> > > >       assert(hook_fn);
> > > > +
> > > > +    /* Update local IOVA mapped ranges */
> > > > +    if (entry->perm) {
> > > > +        if (mapped) {
> > > > +            /* Skip since we have already mapped this range */
> > > > +            trace_vtd_page_walk_one_skip_map(entry->iova, entry-
> > > >addr_mask,
> > > > +                                             mapped->start, 
> > > > mapped->end);
> > > > +            return 0;
> > > > +        }
> > > > +        it_tree_insert(as->iova_tree, entry->iova,
> > > > +                       entry->iova + entry->addr_mask);
> > >
> > > I was consider a case e.g:
> > >
> > > 1) map A (iova) to B (pa)
> > > 2) invalidate A
> > > 3) map A (iova) to C (pa)
> > > 4) invalidate A
> > >
> > > In this case, we will probably miss a walk here. But I'm not sure it was
> > > allowed by the spec (though I think so).
> > >
> 
> Hi, Kevin,
> 
> Thanks for joining the discussion.
> 
> >
> > I thought it was wrong in a glimpse, but then changed my mind after
> > another thinking. As long as device driver can quiescent the device
> > to not access A (iova) within above window, then above sequence
> > has no problem since any stale mappings (A->B) added before step 4)
> > won't be used and then will get flushed after step 4). Regarding to
> > that actually the 1st invalidation can be skipped:
> >
> > 1) map A (iova) to B (pa)
> > 2) driver programs device to use A
> > 3) driver programs device to not use A
> > 4) map A (iova) to C (pa)
> >     A->B may be still valid in IOTLB
> > 5) invalidate A
> > 6) driver programs device to use A
> 
> Note that IMHO this is a bit different from Jason's example, and it'll
> be fine.  Current code should work well with this scenario since the
> emulation code will not aware of the map A until step (5).  Then we'll
> have the correct mapping.

you are right. we still need the extra PSI otherwise the 1st mapping
is problematic for use. So back to Jason's example.

> 
> While for Jason's example it's exactly the extra PSI that might cause
> stale mappings (though again I think it's still problematic...).

problematic in software side (e.g. that way IOMMU core relies on
device drivers which conflict with the value of using IOMMU) but
it is OK from hardware p.o.v. btw the extra PSI itself doesn't cause
stale mapping. Instead it is device activity after that PSI may cause it.

> 
> Actually I think I can just fix up the code even if Jason's case
> happens by unmapping that first then remap:
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 31e9b52452..2a9584f9d8 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -778,13 +778,21 @@ static int vtd_page_walk_one(IOMMUTLBEntry
> *entry, int level,
>      /* Update local IOVA mapped ranges */
>      if (entry->perm) {
>          if (mapped) {
> -            /* Skip since we have already mapped this range */
> -            trace_vtd_page_walk_one_skip_map(entry->iova, entry-
> >addr_mask,
> -                                             mapped->start, mapped->end);
> -            return 0;
> +            int ret;
> +            /* Cache the write permission */
> +            IOMMUAccessFlags flags = entry->perm;
> +
> +            /* UNMAP the old first then remap.  No need to touch IOVA tree */
> +            entry->perm = IOMMU_NONE;
> +            ret = hook_fn(entry, private);
> +            if (ret) {
> +                return ret;
> +            }
> +            entry->perm = flags;
> +        } else {
> +            it_tree_insert(as->iova_tree, entry->iova,
> +                           entry->iova + entry->addr_mask);
>          }
> -        it_tree_insert(as->iova_tree, entry->iova,
> -                       entry->iova + entry->addr_mask);
>      } else {
>          if (!mapped) {
>              /* Skip since we didn't map this range at all */
> 
> If we really think it necessary, I can squash this in, though this is
> a bit ugly.  But I just want to confirm whether this would be anything
> we want...
> 

I didn’t look into your actual code yet. If others think above
change is OK then it's definitely good as we conform to hardware
behavior here. Otherwise if there is a way to detect such unusual 
usage and then adopt some action (say kill the VM), it's also fine 
since user knows he runs a bad OS which is not supported by 
Qemu. It's just not good if such situation is not handled, which 
leads to some undefined behavior which nobody knows the reason 
w/o hard debug into. :-)

Thanks
Kevin

reply via email to

[Prev in Thread] Current Thread [Next in Thread]