[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH RFC 1/6] i386/pc: Account IOVA reserved ranges above 4G bound
From: |
Igor Mammedov |
Subject: |
Re: [PATCH RFC 1/6] i386/pc: Account IOVA reserved ranges above 4G boundary |
Date: |
Mon, 28 Jun 2021 17:21:50 +0200 |
On Mon, 28 Jun 2021 14:43:48 +0100
Joao Martins <joao.m.martins@oracle.com> wrote:
> On 6/28/21 2:25 PM, Igor Mammedov wrote:
> > On Wed, 23 Jun 2021 14:07:29 +0100
> > Joao Martins <joao.m.martins@oracle.com> wrote:
> >
> >> On 6/23/21 1:09 PM, Igor Mammedov wrote:
> >>> On Wed, 23 Jun 2021 10:51:59 +0100
> >>> Joao Martins <joao.m.martins@oracle.com> wrote:
> >>>
> >>>> On 6/23/21 10:03 AM, Igor Mammedov wrote:
> >>>>> On Tue, 22 Jun 2021 16:49:00 +0100
> >>>>> Joao Martins <joao.m.martins@oracle.com> wrote:
> >>>>>
> >>>>>> It is assumed that the whole GPA space is available to be
> >>>>>> DMA addressable, within a given address space limit. Since
> >>>>>> v5.4 based that is not true, and VFIO will validate whether
> >>>>>> the selected IOVA is indeed valid i.e. not reserved by IOMMU
> >>>>>> on behalf of some specific devices or platform-defined.
> >>>>>>
> >>>>>> AMD systems with an IOMMU are examples of such platforms and
> >>>>>> particularly may export only these ranges as allowed:
> >>>>>>
> >>>>>> 0000000000000000 - 00000000fedfffff (0 .. 3.982G)
> >>>>>> 00000000fef00000 - 000000fcffffffff (3.983G .. 1011.9G)
> >>>>>> 0000010000000000 - ffffffffffffffff (1Tb .. 16Pb)
> >>>>>>
> >>>>>> We already know of accounting for the 4G hole, albeit if the
> >>>>>> guest is big enough we will fail to allocate a >1010G given
> >>>>>> the ~12G hole at the 1Tb boundary, reserved for HyperTransport.
> >>>>>>
> >>>>>> When creating the region above 4G, take into account what
> >>>>>> IOVAs are allowed by defining the known allowed ranges
> >>>>>> and search for the next free IOVA ranges. When finding a
> >>>>>> invalid IOVA we mark them as reserved and proceed to the
> >>>>>> next allowed IOVA region.
> >>>>>>
> >>>>>> After accounting for the 1Tb hole on AMD hosts, mtree should
> >>>>>> look like:
> >>>>>>
> >>>>>> 0000000100000000-000000fcffffffff (prio 0, i/o):
> >>>>>> alias ram-above-4g @pc.ram 0000000080000000-000000fc7fffffff
> >>>>>> 0000010000000000-000001037fffffff (prio 0, i/o):
> >>>>>> alias ram-above-1t @pc.ram 000000fc80000000-000000ffffffffff
> >>>>>>
> >>>>>
> >>>>> You are talking here about GPA which is guest specific thing
> >>>>> and then somehow it becomes tied to host. For bystanders it's
> >>>>> not clear from above commit message how both are related.
> >>>>> I'd add here an explicit explanation how AMD host is related GPAs
> >>>>> and clarify where you are talking about guest/host side.
> >>>>>
> >>>> OK, makes sense.
> >>>>
> >>>> Perhaps using IOVA makes it easier to understand. I said GPA because
> >>>> there's an 1:1 mapping between GPA and IOVA (if you're not using
> >>>> vIOMMU).
> >>>
> >>> IOVA may be a too broad term, maybe explain it in terms of GPA and HPA
> >>> and why it does matter on each side (host/guest)
> >>>
> >>
> >> I used the term IOVA specially because that is applicable to Host IOVA or
> >> Guest IOVA (same rules apply as this is not special cased for VMs). So,
> >> regardless of whether we have guest mode page tables, or just host
> >> iommu page tables, this address range should be reserved and not used.
> >
> > IOVA doesn't make it any clearer, on contrary it's more confusing.
> >
> > And does host's HPA matter at all? (if host's firmware isn't broken,
> > it should never use nor advertise 1Tb hole).
> > So we probably talking here only about GPA only.
> >
> For the case in point for the series, yes it's only GPA that we care about.
>
> Perhaps I misunderstood your earlier comment where you said how HPAs were
> affected, so I was trying to encompass the problem statement in a Guest/Host
> agnostic manner by using IOVA given this is all related to IOMMU reserved
> ranges.
> I'll stick to GPA to avoid any confusion -- as that's what matters for this
> series.
Even better is to add here a reference to spec where it says so.
>
> >>>>> also what about usecases:
> >>>>> * start QEMU with Intel cpu model on AMD host with intel's iommu
> >>>>
> >>>> In principle it would be less likely to occur. But you would still need
> >>>> to mark the same range as reserved. The limitation is on DMA occuring
> >>>> on those IOVAs (host or guest) coinciding with that range, so you would
> >>>> want to inform the guest that at least those should be avoided.
> >>>>
> >>>>> * start QEMU with AMD cpu model and AMD's iommu on Intel host
> >>>>
> >>>> Here you would probably only mark the range, solely for honoring how
> >>>> hardware
> >>>> is usually represented. But really, on Intel, nothing stops you from
> >>>> exposing the
> >>>> aforementioned range as RAM.
> >>>>
> >>>>> * start QEMU in TCG mode on AMD host (mostly form qtest point ot view)
> >>>>>
> >>>> This one is tricky. Because you can hotplug a VFIO device later on,
> >>>> I opted for always marking the reserved range. If you don't use VFIO
> >>>> you're good, but
> >>>> otherwise you would still need reserved. But I am not sure how qtest is
> >>>> used
> >>>> today for testing huge guests.
> >>> I do not know if there are VFIO tests in qtest (probably nope, since that
> >>> could require a host configured for that), but we can add a test
> >>> for his memory quirk (assuming phys-bits won't get in the way)
> >>>
> >>
> >> Joao
> >>
> >
>
- Re: [PATCH RFC 1/6] i386/pc: Account IOVA reserved ranges above 4G boundary, (continued)
Re: [PATCH RFC 1/6] i386/pc: Account IOVA reserved ranges above 4G boundary, Igor Mammedov, 2021/06/23
- Re: [PATCH RFC 1/6] i386/pc: Account IOVA reserved ranges above 4G boundary, Joao Martins, 2021/06/23
- Re: [PATCH RFC 1/6] i386/pc: Account IOVA reserved ranges above 4G boundary, Igor Mammedov, 2021/06/23
- Re: [PATCH RFC 1/6] i386/pc: Account IOVA reserved ranges above 4G boundary, Joao Martins, 2021/06/23
- Re: [PATCH RFC 1/6] i386/pc: Account IOVA reserved ranges above 4G boundary, Igor Mammedov, 2021/06/28
- Re: [PATCH RFC 1/6] i386/pc: Account IOVA reserved ranges above 4G boundary, Joao Martins, 2021/06/28
- Re: [PATCH RFC 1/6] i386/pc: Account IOVA reserved ranges above 4G boundary,
Igor Mammedov <=
Re: [PATCH RFC 1/6] i386/pc: Account IOVA reserved ranges above 4G boundary, Dr. David Alan Gilbert, 2021/06/24
Re: [PATCH RFC 1/6] i386/pc: Account IOVA reserved ranges above 4G boundary, Igor Mammedov, 2021/06/28
[PATCH RFC 3/6] pc/cmos: Adjust CMOS above 4G memory size according to 1Tb boundary, Joao Martins, 2021/06/22
[PATCH RFC 4/6] i386/pc: Keep PCI 64-bit hole within usable IOVA space, Joao Martins, 2021/06/22
[PATCH RFC 5/6] i386/acpi: Fix SRAT ranges in accordance to usable IOVA, Joao Martins, 2021/06/22