[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v3 00/12] intel-iommu: nested vIOMMU, cleanups,
From: |
Michael S. Tsirkin |
Subject: |
Re: [Qemu-devel] [PATCH v3 00/12] intel-iommu: nested vIOMMU, cleanups, bug fixes |
Date: |
Fri, 18 May 2018 00:08:01 +0300 |
On Thu, May 17, 2018 at 04:59:15PM +0800, Peter Xu wrote:
> (Hello, Jintack, Feel free to test this branch again against your scp
> error case when you got free time)
>
> I rewrote some of the patches in V3. Major changes:
>
> - Dropped mergable interval tree, instead introduced IOVA tree, which
> is even simpler.
>
> - Fix the scp error issue that Jintack reported. Please see patches
> for detailed information. That's the major reason to rewrite a few
> of the patches. We use replay for domain flushes are possibly
> incorrect in the past. The thing is that IOMMU replay has an
> "definition" that "we should only send MAP when new page detected",
> while for shadow page syncing we actually need something else than
> that. So in this version I started to use a new
> vtd_sync_shadow_page_table() helper to do the page sync.
>
> - Some other refines after the refactoring.
>
> I'll add unit test for the IOVA tree after this series merged to make
> sure we won't switch to another new tree implementaion...
>
> The element size in the new IOVA tree should be around
> sizeof(GTreeNode + IOMMUTLBEntry) ~= (5*8+4*8) = 72 bytes. So the
> worst case usage ratio would be 72/4K=2%, which still seems acceptable
> (it means 8G L2 guest will use 8G*2%=160MB as metadata to maintain the
> mapping in QEMU).
>
> I did explicit test with scp this time, copying 1G sized file for >10
> times on each of the following case:
>
> - L1 guest, with vIOMMU and with assigned device
> - L2 guest, without vIOMMU and with assigned device
> - L2 guest, with vIOMMU (so 3-layer nested IOMMU) and with assigned device
>
> Please review. Thanks,
>
> (Below are old content from previous cover letter)
>
> ==========================
>
> v2:
> - fix patchew code style warnings
> - interval tree: postpone malloc when inserting; simplify node remove
> a bit where proper [Jason]
> - fix up comment and commit message for iommu lock patch [Kevin]
> - protect context cache too using the iommu lock [Kevin, Jason]
> - add vast comment in patch 8 to explain the modify-PTE problem
> [Jason, Kevin]
>
> Online repo:
>
> https://github.com/xzpeter/qemu/tree/fix-vtd-dma
>
> This series fixes several major problems that current code has:
>
> - Issue 1: when getting very big PSI UNMAP invalidations, the current
> code is buggy in that we might skip the notification while actually
> we should always send that notification.
>
> - Issue 2: IOTLB is not thread safe, while block dataplane can be
> accessing and updating it in parallel.
>
> - Issue 3: For devices that only registered with UNMAP-only notifiers,
> we don't really need to do page walking for PSIs, we can directly
> deliver the notification down. For example, vhost.
>
> - Issue 4: unsafe window for MAP notified devices like vfio-pci (and
> in the future, vDPA as well). The problem is that, now for domain
> invalidations we do this to make sure the shadow page tables are
> correctly synced:
>
> 1. unmap the whole address space
> 2. replay the whole address space, map existing pages
>
> However during step 1 and 2 there will be a very tiny window (it can
> be as big as 3ms) that the shadow page table is either invalid or
> incomplete (since we're rebuilding it up). That's fatal error since
> devices never know that happending and it's still possible to DMA to
> memories.
>
> Patch 1 fixes issue 1. I put it at the first since it's picked from
> an old post.
>
> Patch 2 is a cleanup to remove useless IntelIOMMUNotifierNode struct.
>
> Patch 3 fixes issue 2.
>
> Patch 4 fixes issue 3.
>
> Patch 5-9 fix issue 4. Here a very simple interval tree is
> implemented based on Gtree. It's different with general interval tree
> in that it does not allow user to pass in private data (e.g.,
> translated addresses). However that benefits us that then we can
> merge adjacent interval leaves so that hopefully we won't consume much
> memory even if the mappings are a lot (that happens for nested virt -
> when mapping the whole L2 guest RAM range, it can be at least in GBs).
>
> Patch 10 is another big cleanup only can work after patch 9.
I think patch numbers are wrong somehow.
Given you also want to tweak one comment, could
you please repost with this fix, and also
in commit log for each patch
- Cc stable
- for security patches mention as much, if possible
add data about the issue and its severity
> Tests:
>
> - device assignments to L1, even L2 guests. With this series applied
> (and the kernel IOMMU patches: https://lkml.org/lkml/2018/4/18/5),
> we can even nest vIOMMU now, e.g., we can specify vIOMMU in L2 guest
> with assigned devices and things will work. We can't before.
>
> - vhost smoke test for regression.
>
> Please review. Thanks,
>
> Peter Xu (12):
> intel-iommu: send PSI always even if across PDEs
> intel-iommu: remove IntelIOMMUNotifierNode
> intel-iommu: add iommu lock
> intel-iommu: only do page walk for MAP notifiers
> intel-iommu: introduce vtd_page_walk_info
> intel-iommu: pass in address space when page walk
> intel-iommu: trace domain id during page walk
> util: implement simple iova tree
> intel-iommu: maintain per-device iova ranges
> intel-iommu: simplify page walk logic
> intel-iommu: new vtd_sync_shadow_page_table_range
> intel-iommu: new sync_shadow_page_table
>
> include/hw/i386/intel_iommu.h | 19 +-
> include/qemu/iova-tree.h | 134 ++++++++++++
> hw/i386/intel_iommu.c | 381 +++++++++++++++++++++++++---------
> util/iova-tree.c | 114 ++++++++++
> MAINTAINERS | 6 +
> hw/i386/trace-events | 5 +-
> util/Makefile.objs | 1 +
> 7 files changed, 556 insertions(+), 104 deletions(-)
> create mode 100644 include/qemu/iova-tree.h
> create mode 100644 util/iova-tree.c
>
> --
> 2.17.0
- [Qemu-devel] [PATCH v3 10/12] intel-iommu: simplify page walk logic, (continued)
- [Qemu-devel] [PATCH v3 10/12] intel-iommu: simplify page walk logic, Peter Xu, 2018/05/17
- [Qemu-devel] [PATCH v3 11/12] intel-iommu: new vtd_sync_shadow_page_table_range, Peter Xu, 2018/05/17
- [Qemu-devel] [PATCH v3 12/12] intel-iommu: new sync_shadow_page_table, Peter Xu, 2018/05/17
- Re: [Qemu-devel] [PATCH v3 00/12] intel-iommu: nested vIOMMU, cleanups, bug fixes, Jintack Lim, 2018/05/17
- Re: [Qemu-devel] [PATCH v3 00/12] intel-iommu: nested vIOMMU, cleanups, bug fixes, Michael S. Tsirkin, 2018/05/17
- Re: [Qemu-devel] [PATCH v3 00/12] intel-iommu: nested vIOMMU, cleanups, bug fixes,
Michael S. Tsirkin <=