qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH RFC 1/6] i386/pc: Account IOVA reserved ranges above 4G bound


From: Joao Martins
Subject: Re: [PATCH RFC 1/6] i386/pc: Account IOVA reserved ranges above 4G boundary
Date: Wed, 23 Jun 2021 10:51:59 +0100

On 6/23/21 10:03 AM, Igor Mammedov wrote:
> On Tue, 22 Jun 2021 16:49:00 +0100
> Joao Martins <joao.m.martins@oracle.com> wrote:
> 
>> It is assumed that the whole GPA space is available to be
>> DMA addressable, within a given address space limit. Since
>> v5.4 based that is not true, and VFIO will validate whether
>> the selected IOVA is indeed valid i.e. not reserved by IOMMU
>> on behalf of some specific devices or platform-defined.
>>
>> AMD systems with an IOMMU are examples of such platforms and
>> particularly may export only these ranges as allowed:
>>
>>      0000000000000000 - 00000000fedfffff (0      .. 3.982G)
>>      00000000fef00000 - 000000fcffffffff (3.983G .. 1011.9G)
>>      0000010000000000 - ffffffffffffffff (1Tb    .. 16Pb)
>>
>> We already know of accounting for the 4G hole, albeit if the
>> guest is big enough we will fail to allocate a >1010G given
>> the ~12G hole at the 1Tb boundary, reserved for HyperTransport.
>>
>> When creating the region above 4G, take into account what
>> IOVAs are allowed by defining the known allowed ranges
>> and search for the next free IOVA ranges. When finding a
>> invalid IOVA we mark them as reserved and proceed to the
>> next allowed IOVA region.
>>
>> After accounting for the 1Tb hole on AMD hosts, mtree should
>> look like:
>>
>> 0000000100000000-000000fcffffffff (prio 0, i/o):
>>      alias ram-above-4g @pc.ram 0000000080000000-000000fc7fffffff
>> 0000010000000000-000001037fffffff (prio 0, i/o):
>>      alias ram-above-1t @pc.ram 000000fc80000000-000000ffffffffff
> 
> You are talking here about GPA which is guest specific thing
> and then somehow it becomes tied to host. For bystanders it's
> not clear from above commit message how both are related.
> I'd add here an explicit explanation how AMD host is related GPAs
> and clarify where you are talking about guest/host side.
> 
OK, makes sense.

Perhaps using IOVA makes it easier to understand. I said GPA because
there's an 1:1 mapping between GPA and IOVA (if you're not using vIOMMU).

> also what about usecases:
>  * start QEMU with Intel cpu model on AMD host with intel's iommu

In principle it would be less likely to occur. But you would still need
to mark the same range as reserved. The limitation is on DMA occuring
on those IOVAs (host or guest) coinciding with that range, so you would
want to inform the guest that at least those should be avoided.

>  * start QEMU with AMD cpu model and AMD's iommu on Intel host

Here you would probably only mark the range, solely for honoring how hardware
is usually represented. But really, on Intel, nothing stops you from exposing 
the
aforementioned range as RAM.

>  * start QEMU in TCG mode on AMD host (mostly form qtest point ot view)
> 
This one is tricky. Because you can hotplug a VFIO device later on,
I opted for always marking the reserved range. If you don't use VFIO you're 
good, but
otherwise you would still need reserved. But I am not sure how qtest is used
today for testing huge guests.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]