qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v3 00/12] intel-iommu: nested vIOMMU, cleanups,


From: Peter Xu
Subject: Re: [Qemu-devel] [PATCH v3 00/12] intel-iommu: nested vIOMMU, cleanups, bug fixes
Date: Fri, 18 May 2018 14:30:04 +0800
User-agent: Mutt/1.9.5 (2018-04-13)

On Fri, May 18, 2018 at 12:08:01AM +0300, Michael S. Tsirkin wrote:
> On Thu, May 17, 2018 at 04:59:15PM +0800, Peter Xu wrote:
> > (Hello, Jintack, Feel free to test this branch again against your scp
> >  error case when you got free time)
> > 
> > I rewrote some of the patches in V3.  Major changes:
> > 
> > - Dropped mergable interval tree, instead introduced IOVA tree, which
> >   is even simpler.
> > 
> > - Fix the scp error issue that Jintack reported.  Please see patches
> >   for detailed information.  That's the major reason to rewrite a few
> >   of the patches.  We use replay for domain flushes are possibly
> >   incorrect in the past.  The thing is that IOMMU replay has an
> >   "definition" that "we should only send MAP when new page detected",
> >   while for shadow page syncing we actually need something else than
> >   that.  So in this version I started to use a new
> >   vtd_sync_shadow_page_table() helper to do the page sync.
> > 
> > - Some other refines after the refactoring.
> > 
> > I'll add unit test for the IOVA tree after this series merged to make
> > sure we won't switch to another new tree implementaion...
> > 
> > The element size in the new IOVA tree should be around
> > sizeof(GTreeNode + IOMMUTLBEntry) ~= (5*8+4*8) = 72 bytes.  So the
> > worst case usage ratio would be 72/4K=2%, which still seems acceptable
> > (it means 8G L2 guest will use 8G*2%=160MB as metadata to maintain the
> > mapping in QEMU).
> > 
> > I did explicit test with scp this time, copying 1G sized file for >10
> > times on each of the following case:
> > 
> > - L1 guest, with vIOMMU and with assigned device
> > - L2 guest, without vIOMMU and with assigned device
> > - L2 guest, with vIOMMU (so 3-layer nested IOMMU) and with assigned device
> > 
> > Please review.  Thanks,
> > 
> > (Below are old content from previous cover letter)
> > 
> > ==========================
> > 
> > v2:
> > - fix patchew code style warnings
> > - interval tree: postpone malloc when inserting; simplify node remove
> >   a bit where proper [Jason]
> > - fix up comment and commit message for iommu lock patch [Kevin]
> > - protect context cache too using the iommu lock [Kevin, Jason]
> > - add vast comment in patch 8 to explain the modify-PTE problem
> >   [Jason, Kevin]
> > 
> > Online repo:
> > 
> >   https://github.com/xzpeter/qemu/tree/fix-vtd-dma
> > 
> > This series fixes several major problems that current code has:
> > 
> > - Issue 1: when getting very big PSI UNMAP invalidations, the current
> >   code is buggy in that we might skip the notification while actually
> >   we should always send that notification.
> > 
> > - Issue 2: IOTLB is not thread safe, while block dataplane can be
> >   accessing and updating it in parallel.
> > 
> > - Issue 3: For devices that only registered with UNMAP-only notifiers,
> >   we don't really need to do page walking for PSIs, we can directly
> >   deliver the notification down.  For example, vhost.
> > 
> > - Issue 4: unsafe window for MAP notified devices like vfio-pci (and
> >   in the future, vDPA as well).  The problem is that, now for domain
> >   invalidations we do this to make sure the shadow page tables are
> >   correctly synced:
> > 
> >   1. unmap the whole address space
> >   2. replay the whole address space, map existing pages
> > 
> >   However during step 1 and 2 there will be a very tiny window (it can
> >   be as big as 3ms) that the shadow page table is either invalid or
> >   incomplete (since we're rebuilding it up).  That's fatal error since
> >   devices never know that happending and it's still possible to DMA to
> >   memories.
> > 
> > Patch 1 fixes issue 1.  I put it at the first since it's picked from
> > an old post.
> > 
> > Patch 2 is a cleanup to remove useless IntelIOMMUNotifierNode struct.
> > 
> > Patch 3 fixes issue 2.
> > 
> > Patch 4 fixes issue 3.
> > 
> > Patch 5-9 fix issue 4.  Here a very simple interval tree is
> > implemented based on Gtree.  It's different with general interval tree
> > in that it does not allow user to pass in private data (e.g.,
> > translated addresses).  However that benefits us that then we can
> > merge adjacent interval leaves so that hopefully we won't consume much
> > memory even if the mappings are a lot (that happens for nested virt -
> > when mapping the whole L2 guest RAM range, it can be at least in GBs).
> > 
> > Patch 10 is another big cleanup only can work after patch 9.
> 
> I think patch numbers are wrong somehow.

Yes, this is the old cover letter, so the numbers do not match.

> 
> Given you also want to tweak one comment, could
> you please repost with this fix, and also
> in commit log for each patch
> - Cc stable
> - for security patches mention as much, if possible
>   add data about the issue and its severity

The rest I can try to do.  Thanks,

-- 
Peter Xu



reply via email to

[Prev in Thread] Current Thread [Next in Thread]