Re: [Qemu-devel] [RFC PATCH qemu v2 1/5] vfio: Switch from TARGET_PAGE

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH qemu v2 1/5] vfio: Switch from TARGET_PAGE_M

From:	Alex Williamson
Subject:	Re: [Qemu-devel] [RFC PATCH qemu v2 1/5] vfio: Switch from TARGET_PAGE_MASK to qemu_real_host_page_mask
Date:	Tue, 14 Jul 2015 16:28:13 -0600

On Tue, 2015-07-14 at 16:58 +1000, Alexey Kardashevskiy wrote:
> On 07/14/2015 05:13 AM, Alex Williamson wrote:
> > On Tue, 2015-07-14 at 00:56 +1000, Alexey Kardashevskiy wrote:
> >> These started switching from TARGET_PAGE_MASK (hardcoded as 4K) to
> >> a real host page size:
> >> 4e51361d7 "cpu-all: complete "real" host page size API" and
> >> f7ceed190 "vfio: cpu: Use "real" page size API"
> >>
> >> This finished the transition by:
> >> - %s/TARGET_PAGE_MASK/qemu_real_host_page_mask/
> >> - %s/TARGET_PAGE_ALIGN/REAL_HOST_PAGE_ALIGN/
> >> - removing bitfield length for offsets in VFIOQuirk::data as
> >> qemu_real_host_page_mask is not a macro
> >
> > Can we assume that none of the changes to quirks have actually been
> > tested?
> 
> No, why? :) I tried it on one of NVIDIAs I got here -
> VGA compatible controller: NVIDIA Corporation GM107GL [Quadro K2200] (rev a2)
> The driver was from NVIDIA (not nouveau) and the test was "acos" (some 
> basic CUDA test).

That's only one of a handful or more quirks.  The VGA related quirks are
all for backdoors in the bootstrap process and MMIO access to config
space, things that I would not expect to see on power.  So power
probably isn't a useful host to test these things.

> > I don't really support them being bundled in here since they
> > really aren't related to what you're doing.
> 
> This makes sense, I'll move them to a separate patch and add a note how it 
> helps on a 64k-pages host.
> 
> > For DMA we generally want
> > to be host IOMMU page aligned,
> 
> Do all known IOMMUs use a constant page size? IOMMU memory region does not 
> have an IOMMU page size/mask and I wanted to add it there but not sure if 
> it is generic enough.

AMD supports nearly any power-of-2 size (>=4k), Intel supports 4k +
optionally 2M and 1G.  The vfio type1 iommu driver looks for physically
contiguous ranges to give to the hardware iommu driver, which makes use
of whatever optimal page size it can.  Therefore, we really don't care
what the hardware page size is beyond the assumption that it supports
4k.  When hugepages are used by the VM, we expect the iommu will
automatically make use of them if supported.  A non-VM vfio userspace
driver might care a little bit more about the supported page sizes, I
imagine.

> > which we can generally assume is the same
> > as host page aligned,
> 
> They are almost never the same on sPAPR for 32bit windows...
> 
> > but quirks are simply I/O regions, so I think they
> > ought to continue to be target page aligned.
> 
> Without s/TARGET_PAGE_MASK/qemu_real_host_page_mask/, 
> &vfio_nvidia_88000_quirk fails+exits in kvm_set_phys_mem() as the size of 
> section is 0x88000. It still works with x-mmap=false (or TCG, I suppose) 
> though.

Think about what this is doing to the guest.  There's a 4k window
(because PCIe extended config space is 4k) at offset 0x88000 of the MMIO
BAR that allows access to PCI config space of the device.  With 4k pages
this all aligns quite nicely and only config space accesses are trapped.
With 64k pages, you're trapping everything from 0x80000 to 0x8ffff.  We
have no idea what else might live in that space and what kind of
performance impact it'll cause to the operation of the device.  If you
don't know that you need it and can't meet the mapping criteria, don't
enable it.  BTW, this quirk is for GeForce, not Quadro.  The region
seems not to be used by Quadro drivers and we can't programatically tell
the difference between Quadro and GeForce hardware, so we leave it
enabled for either on x86.

Maybe these really should be real_host_page_size, but I can't really
picture how an underlying 64k page size host gets imposed on a guest
that's only aware of a 4k page size.  For instance, what prevents a 4k
guest from mapping PCI BARs with 4k alignment.  Multiple BARs for
different devices fit within a 64k host page.  MMU mappings would seem
to have similar issues.  These quirks need to be on the granularity we
take a target page fault, which I imagine is the same as the
real_host_page_size.  I'd expect that unless you're going to support
consumer graphics or crappy realtek NICs, none of these quirks are
relevant to you.

> >> This keeps using TARGET_PAGE_MASK for IOMMU regions though as it is
> >> the minimum page size which IOMMU regions may be using and at the moment
> >> memory regions do not carry the actual page size.
> >>
> >> Signed-off-by: Alexey Kardashevskiy <address@hidden>
> >> ---
> >>
> >> In reality DMA windows are always a lot bigger than a single 4K page
> >> and aligned to 32/64MB, may be only use here qemu_real_host_page_mask?
> >
> > I don't understand what this is asking either.  While the bulk of memory
> > is going to be mapped in larger chunks, we do occasionally see 4k
> > mappings on x86, particularly in some of the legacy low memory areas.
> 
> 
> The question was not about individual mappings - these are handled by a 
> iommu memory region notifier; here we are dealing with DMA windows which 
> are always megabytes but nothing really prevents/prohibits a guest from 
> requesting a 4K _window_ (with a single TCE entry). Whether we want to 
> support such small windows or not - this was a question.

But you're using the same memory listener that does deak with 4k
mappings.  What optimization would you hope to achieve by assuming only
larger mappings that compensates for the lack of generality?  Thanks,

Alex

> >> ---
> >>   hw/vfio/common.c | 26 ++++++++++++++++----------
> >>   hw/vfio/pci.c    | 30 +++++++++++++++---------------
> >>   2 files changed, 31 insertions(+), 25 deletions(-)
> >>
> >> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> >> index 85ee9b0..d115ec9 100644
> >> --- a/hw/vfio/common.c
> >> +++ b/hw/vfio/common.c
> >> @@ -321,6 +321,9 @@ static void vfio_listener_region_add(MemoryListener 
> >> *listener,
> >>       Int128 llend;
> >>       void *vaddr;
> >>       int ret;
> >> +    const bool is_iommu = memory_region_is_iommu(section->mr);
> >> +    const hwaddr page_mask =
> >> +        is_iommu ? TARGET_PAGE_MASK : qemu_real_host_page_mask;
> >>
> >>       if (vfio_listener_skipped_section(section)) {
> >>           trace_vfio_listener_region_add_skip(
> >> @@ -330,16 +333,16 @@ static void vfio_listener_region_add(MemoryListener 
> >> *listener,
> >>           return;
> >>       }
> >>
> >> -    if (unlikely((section->offset_within_address_space & 
> >> ~TARGET_PAGE_MASK) !=
> >> -                 (section->offset_within_region & ~TARGET_PAGE_MASK))) {
> >> +    if (unlikely((section->offset_within_address_space & ~page_mask) !=
> >> +                 (section->offset_within_region & ~page_mask))) {
> >>           error_report("%s received unaligned region", __func__);
> >>           return;
> >>       }
> >>
> >> -    iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
> >> +    iova = ROUND_UP(section->offset_within_address_space, ~page_mask + 1);
> >>       llend = int128_make64(section->offset_within_address_space);
> >>       llend = int128_add(llend, section->size);
> >> -    llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK));
> >> +    llend = int128_and(llend, int128_exts64(page_mask));
> >>
> >>       if (int128_ge(int128_make64(iova), llend)) {
> >>           return;
> >> @@ -347,7 +350,7 @@ static void vfio_listener_region_add(MemoryListener 
> >> *listener,
> >>
> >>       memory_region_ref(section->mr);
> >>
> >> -    if (memory_region_is_iommu(section->mr)) {
> >> +    if (is_iommu) {
> >>           VFIOGuestIOMMU *giommu;
> >>
> >>           trace_vfio_listener_region_add_iommu(iova,
> >> @@ -423,6 +426,9 @@ static void vfio_listener_region_del(MemoryListener 
> >> *listener,
> >>                                               iommu_data.type1.listener);
> >>       hwaddr iova, end;
> >>       int ret;
> >> +    const bool is_iommu = memory_region_is_iommu(section->mr);
> >> +    const hwaddr page_mask =
> >> +        is_iommu ? TARGET_PAGE_MASK : qemu_real_host_page_mask;
> >>
> >>       if (vfio_listener_skipped_section(section)) {
> >>           trace_vfio_listener_region_del_skip(
> >> @@ -432,13 +438,13 @@ static void vfio_listener_region_del(MemoryListener 
> >> *listener,
> >>           return;
> >>       }
> >>
> >> -    if (unlikely((section->offset_within_address_space & 
> >> ~TARGET_PAGE_MASK) !=
> >> -                 (section->offset_within_region & ~TARGET_PAGE_MASK))) {
> >> +    if (unlikely((section->offset_within_address_space & ~page_mask) !=
> >> +                 (section->offset_within_region & ~page_mask))) {
> >>           error_report("%s received unaligned region", __func__);
> >>           return;
> >>       }
> >>
> >> -    if (memory_region_is_iommu(section->mr)) {
> >> +    if (is_iommu) {
> >>           VFIOGuestIOMMU *giommu;
> >>
> >>           QLIST_FOREACH(giommu, &container->giommu_list, giommu_next) {
> >> @@ -459,9 +465,9 @@ static void vfio_listener_region_del(MemoryListener 
> >> *listener,
> >>            */
> >>       }
> >>
> >> -    iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
> >> +    iova = REAL_HOST_PAGE_ALIGN(section->offset_within_address_space);
> >>       end = (section->offset_within_address_space + 
> >> int128_get64(section->size)) &
> >> -          TARGET_PAGE_MASK;
> >> +          page_mask;
> >>
> >>       if (iova >= end) {
> >>           return;
> >> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> >> index 2ed877f..7694afe 100644
> >> --- a/hw/vfio/pci.c
> >> +++ b/hw/vfio/pci.c
> >> @@ -50,16 +50,16 @@ typedef struct VFIOQuirk {
> >>       struct VFIOPCIDevice *vdev;
> >>       QLIST_ENTRY(VFIOQuirk) next;
> >>       struct {
> >> -        uint32_t base_offset:TARGET_PAGE_BITS;
> >> -        uint32_t address_offset:TARGET_PAGE_BITS;
> >> +        uint32_t base_offset;
> >> +        uint32_t address_offset;
> >>           uint32_t address_size:3;
> >>           uint32_t bar:3;
> >>
> >>           uint32_t address_match;
> >>           uint32_t address_mask;
> >>
> >> -        uint32_t address_val:TARGET_PAGE_BITS;
> >> -        uint32_t data_offset:TARGET_PAGE_BITS;
> >> +        uint32_t address_val;
> >> +        uint32_t data_offset;
> >>           uint32_t data_size:3;
> >>
> >>           uint8_t flags;
> >> @@ -1319,8 +1319,8 @@ static uint64_t vfio_generic_quirk_read(void *opaque,
> >>   {
> >>       VFIOQuirk *quirk = opaque;
> >>       VFIOPCIDevice *vdev = quirk->vdev;
> >> -    hwaddr base = quirk->data.address_match & TARGET_PAGE_MASK;
> >> -    hwaddr offset = quirk->data.address_match & ~TARGET_PAGE_MASK;
> >> +    hwaddr base = quirk->data.address_match & qemu_real_host_page_mask;
> >> +    hwaddr offset = quirk->data.address_match & ~qemu_real_host_page_mask;
> >>       uint64_t data;
> >>
> >>       if (vfio_flags_enabled(quirk->data.flags, quirk->data.read_flags) &&
> >> @@ -1349,8 +1349,8 @@ static void vfio_generic_quirk_write(void *opaque, 
> >> hwaddr addr,
> >>   {
> >>       VFIOQuirk *quirk = opaque;
> >>       VFIOPCIDevice *vdev = quirk->vdev;
> >> -    hwaddr base = quirk->data.address_match & TARGET_PAGE_MASK;
> >> -    hwaddr offset = quirk->data.address_match & ~TARGET_PAGE_MASK;
> >> +    hwaddr base = quirk->data.address_match & qemu_real_host_page_mask;
> >> +    hwaddr offset = quirk->data.address_match & ~qemu_real_host_page_mask;
> >>
> >>       if (vfio_flags_enabled(quirk->data.flags, quirk->data.write_flags) &&
> >>           ranges_overlap(addr, size, offset, quirk->data.address_mask + 
> >> 1)) {
> >> @@ -1650,9 +1650,9 @@ static void 
> >> vfio_probe_ati_bar2_4000_quirk(VFIOPCIDevice *vdev, int nr)
> >>
> >>       memory_region_init_io(&quirk->mem, OBJECT(vdev), 
> >> &vfio_generic_quirk, quirk,
> >>                             "vfio-ati-bar2-4000-quirk",
> >> -                          TARGET_PAGE_ALIGN(quirk->data.address_mask + 
> >> 1));
> >> +                          REAL_HOST_PAGE_ALIGN(quirk->data.address_mask + 
> >> 1));
> >>       memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem,
> >> -                          quirk->data.address_match & TARGET_PAGE_MASK,
> >> +                          quirk->data.address_match & 
> >> qemu_real_host_page_mask,
> >>                             &quirk->mem, 1);
> >>
> >>       QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, quirk, next);
> >> @@ -1888,7 +1888,7 @@ static void vfio_nvidia_88000_quirk_write(void 
> >> *opaque, hwaddr addr,
> >>       VFIOQuirk *quirk = opaque;
> >>       VFIOPCIDevice *vdev = quirk->vdev;
> >>       PCIDevice *pdev = &vdev->pdev;
> >> -    hwaddr base = quirk->data.address_match & TARGET_PAGE_MASK;
> >> +    hwaddr base = quirk->data.address_match & qemu_real_host_page_mask;
> >>
> >>       vfio_generic_quirk_write(opaque, addr, data, size);
> >>
> >> @@ -1943,9 +1943,9 @@ static void 
> >> vfio_probe_nvidia_bar0_88000_quirk(VFIOPCIDevice *vdev, int nr)
> >>
> >>       memory_region_init_io(&quirk->mem, OBJECT(vdev), 
> >> &vfio_nvidia_88000_quirk,
> >>                             quirk, "vfio-nvidia-bar0-88000-quirk",
> >> -                          TARGET_PAGE_ALIGN(quirk->data.address_mask + 
> >> 1));
> >> +                          REAL_HOST_PAGE_ALIGN(quirk->data.address_mask + 
> >> 1));
> >>       memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem,
> >> -                          quirk->data.address_match & TARGET_PAGE_MASK,
> >> +                          quirk->data.address_match & 
> >> qemu_real_host_page_mask,
> >>                             &quirk->mem, 1);
> >>
> >>       QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, quirk, next);
> >> @@ -1980,9 +1980,9 @@ static void 
> >> vfio_probe_nvidia_bar0_1800_quirk(VFIOPCIDevice *vdev, int nr)
> >>
> >>       memory_region_init_io(&quirk->mem, OBJECT(vdev), 
> >> &vfio_generic_quirk, quirk,
> >>                             "vfio-nvidia-bar0-1800-quirk",
> >> -                          TARGET_PAGE_ALIGN(quirk->data.address_mask + 
> >> 1));
> >> +                          REAL_HOST_PAGE_ALIGN(quirk->data.address_mask + 
> >> 1));
> >>       memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem,
> >> -                          quirk->data.address_match & TARGET_PAGE_MASK,
> >> +                          quirk->data.address_match & 
> >> qemu_real_host_page_mask,
> >>                             &quirk->mem, 1);
> >>
> >>       QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, quirk, next);
> >
> >
> >
> 
>

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] [RFC PATCH qemu v2 0/5] vfio: SPAPR IOMMU v2 (memory preregistration support), Alexey Kardashevskiy, 2015/07/13
- [Qemu-devel] [RFC PATCH qemu v2 1/5] vfio: Switch from TARGET_PAGE_MASK to qemu_real_host_page_mask, Alexey Kardashevskiy, 2015/07/13
  - Re: [Qemu-devel] [RFC PATCH qemu v2 1/5] vfio: Switch from TARGET_PAGE_MASK to qemu_real_host_page_mask, Alex Williamson, 2015/07/13
    - Re: [Qemu-devel] [RFC PATCH qemu v2 1/5] vfio: Switch from TARGET_PAGE_MASK to qemu_real_host_page_mask, Alexey Kardashevskiy, 2015/07/14
    - Re: [Qemu-devel] [RFC PATCH qemu v2 1/5] vfio: Switch from TARGET_PAGE_MASK to qemu_real_host_page_mask, Alexey Kardashevskiy, 2015/07/14
    - Re: [Qemu-devel] [RFC PATCH qemu v2 1/5] vfio: Switch from TARGET_PAGE_MASK to qemu_real_host_page_mask, Alex Williamson <=
    - Re: [Qemu-devel] [RFC PATCH qemu v2 1/5] vfio: Switch from TARGET_PAGE_MASK to qemu_real_host_page_mask, Alexey Kardashevskiy, 2015/07/14
- [Qemu-devel] [RFC PATCH qemu v2 5/5] vfio: spapr: Add SPAPR IOMMU v2 support (DMA memory preregistering), Alexey Kardashevskiy, 2015/07/13
- [Qemu-devel] [RFC PATCH qemu v2 4/5] vfio: Refactor memory listener to accommodate more IOMMU types, Alexey Kardashevskiy, 2015/07/13
- [Qemu-devel] [RFC PATCH qemu v2 2/5] vfio: Skip PCI BARs in memory listener, Alexey Kardashevskiy, 2015/07/13
- [Qemu-devel] [RFC PATCH qemu v2 3/5] vfio: Store IOMMU type in container, Alexey Kardashevskiy, 2015/07/13
- Re: [Qemu-devel] [RFC PATCH qemu v2 0/5] vfio: SPAPR IOMMU v2 (memory preregistration support), Alex Williamson, 2015/07/13

Prev by Date: Re: [Qemu-devel] [PATCH] target-mips: apply workaround for TCG optimizations for MFC1
Next by Date: [Qemu-devel] TLB, cpu_mmu_index() and load/store vs ifetch
Previous by thread: Re: [Qemu-devel] [RFC PATCH qemu v2 1/5] vfio: Switch from TARGET_PAGE_MASK to qemu_real_host_page_mask
Next by thread: Re: [Qemu-devel] [RFC PATCH qemu v2 1/5] vfio: Switch from TARGET_PAGE_MASK to qemu_real_host_page_mask
Index(es):
- Date
- Thread