qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device


From: Bharat Bhushan
Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
Date: Fri, 7 Jul 2017 11:36:54 +0000

Hi Eric,

> >>>> -----Original Message-----
> >>>> From: Jean-Philippe Brucker [mailto:address@hidden
> >>>> Sent: Thursday, July 06, 2017 3:33 PM
> >>>> To: Bharat Bhushan <address@hidden>; Auger Eric
> >>>> <address@hidden>; address@hidden;
> >>>> address@hidden; address@hidden;
> >> address@hidden;
> >>>> address@hidden; address@hidden
> >>>> Cc: address@hidden; address@hidden; address@hidden;
> >>>> address@hidden; address@hidden; address@hidden;
> >>>> address@hidden; address@hidden
> >>>> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
> >>>>
> >>>> On 05/07/17 09:49, Bharat Bhushan wrote:>>> Also when setup
> >>>> msi-route
> >>>> kvm_irqchip_add_msi_route() we needed to
> >>>>>> provide the translated address.
> >>>>>>> According to my understanding this is required because kernel
> >>>>>>> does no go
> >>>>>> through viommu translation when generating interrupt, no?
> >>>>>>
> >>>>>> yes this is needed when KVM MSI routes are set up, ie. along with
> >>>>>> GICV3
> >>>> ITS.
> >>>>>> With GICv2M, qemu direct gsi mapping is used and this is not
> needed.
> >>>>>>
> >>>>>> So I do not understand your previous sentence saying "MSI
> >>>>>> interrupts works without any change".
> >>>>>
> >>>>> I have almost completed vfio integration with virtio-iommu and now
> >>>>> testing the changes by assigning e1000 device to VM. For this I
> >>>>> have changed virtio-iommu driver to use IOMMU_RESV_MSI rather
> than
> >>>>> sw-
> >> msi
> >>>>> and this does not need changed in vfio_get_addr()  and
> >>>>> kvm_irqchip_add_msi_route()
> >>>>
> >>>> I understand you're reserving region 0x08000000-0x08100000 as
> >>>> IOMMU_RESV_MSI instead of IOMMU_RESV_SW_MSI? I think this
> only
> >> works
> >>>> because Qemu places the vgic in that area as well (in hw/arm/virt.c).
> >>>> It's not a coincidence if the addresses are the same, because Eric
> >>>> chose them for the Linux SMMU drivers and I copied them.
> >>>>
> >>>> We can't rely on that behavior, though, it will break MSIs in
> >>>> emulated devices. And if Qemu happens to move the MSI doorbell in
> >>>> future machine revisions, then it would also break VFIO.
> >>>
> >>> Yes, make sense to me
> >>>
> >>>>
> >>>> Just for my own understanding -- what happens, I think, is that in
> >>>> Linux iova_reserve_iommu_regions initially reserves the
> >>>> guest-physical doorbell 0x08000000-0x08100000. Then much later,
> >>>> when the device driver requests an MSI, the irqchip driver calls
> >>>> iommu_dma_map_msi_msg with the guest- physical gicv2m address
> >>>> 0x08020000. The function finds the right page in msi_page_list,
> >>>> which was added by cookie_init_hw_msi_region, therefore bypassing
> >>>> the
> >> viommu and the GPA gets written in the MSI-X table.
> >>>
> >>> This means in case tomorrow when qemu changes virt machine address
> >> map and vgic-its (its-translator register address) address range does
> >> not fall in the msi_page_list then it will allocate a new iova,
> >> create mapping in iommu. So this will no longer be identity mapped
> >> and fail to work with new qemu?
> >>>
> >> Yes that's correct.
> >>>>
> >>>> If an emulated device such as virtio-net-pci were to generate an
> >>>> MSI, then Qemu would attempt to access the doorbell written by
> >>>> Linux into the MSI-X table, 0x08020000, and fault because that
> >>>> address wasn't mapped in the viommu.
> >>>>
> >>>> So for VFIO, you either need to translate the MSI-X entry using the
> >>>> viommu, or just assume that the vaddr corresponds to the only MSI
> >>>> doorbell accessible by this device (because how can we be certain
> >>>> that the guest already mapped the doorbell before writing the
> >>>> entry?)
> >>>>
> >>>> For ARM machines it's probably best to stick with
> >> IOMMU_RESV_SW_MSI.
> >>>> However, a nice way to use IOMMU_RESV_MSI would be for the virtio-
> >>>> iommu device to advertise identity-mapped/reserved regions, and
> >>>> bypass translation on these regions. Then the driver could reserve
> >>>> those with IOMMU_RESV_MSI.
> >>>
> >>> Correct me if I did not understood you correctly, today iommu-driver
> >> decides msi-reserved region, what if we change this and virtio-iommu
> >> device will provide the reserved msi region as per the emulated machine
> (virt/intel).
> >> So virtio-iommu driver will use the address advertised by
> >> virtio-iommu device as IOMMU_RESV_MSI. In this case msi-page-list
> >> will always have the reserved region for MSI.
> >>> On qemu side, for emulated devices we will let virtio-iommu return
> >>> same
> >> address as translated address as it falls in MSI-reserved page
> >> already known to it.
> >>
> >> I think what you're proposing here corresponds to the 1st approach
> >> that was followed for PCIe passthrough/MSI on ARM, ie. the userspace
> >> was providing the reserved region base address & size.
> >>  This was ruled out and now this
> >> region is arbitrarily set by the smmu-driver. At the moment this
> >> means this region cannot contain guest RAM.
> >
> > In rejected proposal, user-space used to choose a reserve region and
> provide that to Host Linux. Host Linux uses that MSI mapping. Just for my
> understanding if tomorrow QEMU changes its address space then it may not
> work without changing SMMU msi-reserved-iova in host driver, right? For
> example if emulated machine have RAM at this address then it will not work?
> Yes that's correct. Note MSI reserved regions are now exposed to userspace
> through /sys/kernel/iommu_groups/<>/reserved_regions
> >
> > In this proposal, QEMU reserves a iova-range for guest (not host) and guest
> kernel will use this as msi-iova untranslated (IOMMU_RESV_MSI). While this
> does not change host interface and it will continue to use host reserved
> mapping for actual interrupt generation, no?
> But then userspace needs to provide IOMMU_RESV_MSI range to guest
> kernel, right? What would be the proposed manner?

Just an opinion, we can define feature (VIRTIO_IOMMU_F_RES_MSI_RANGE) and 
provide this info via a command (VIRTIO_IOMMU_T_MSI_RANGE). Guest iommu-driver 
will make this call during initialization and store the value. This value will 
just replace MSI_IOVA_BASE and MSI_IOVA_LENGHT hash define. Rest will remain 
same in virtio-iommu driver.

> Looks weird to me to
> have different MSI handling on host and guest. Also I still don't get how you
> handle the case where virtio-net-pci emits accesses to the MSI doorbell
> while this latter is not mapped.

I think I will look the code in detail for virtio-net-pci code, I was thinking 
of if we can handle this region differently or make 1:1 translation on 
virtio_iommu_translate() for this region. But my knowledge is limited around 
this piece of code. Is this possible or ugly ?

Thanks
-Bharat

> 
> Thanks
> 
> Eric
> >
> > Thanks
> > -Bharat
> >
> >>
> >>>
> >>>
> >>>> For x86 we will need such a system, with an added IRQ remapping
> >>>> feature.
> >>>
> >>> I do not understand x86 MSI interrupt generation, but If above
> >>> understand
> >> is correct, then why we need IRQ remapping for x86?
> >> To me x86 IR corresponds simply corresponds to the ITS MSI controller
> >> modality on ARM. So as you still need vITS along with virtio-iommu on
> >> ARM, you need vIRQ alongs with virtio-iommu on Intel. Does that make
> sense?
> >>
> >> So in any case we need to make sure the guest uses a vITS or vIR to
> >> make sure MSIs are correctly isolated.
> >>
> >>
> >>> Will the x86 machine emulated in QEMU provides a big address range
> >>> for
> >> MSIs and when actually generating MSI it needed some extra processing
> >> (IRQ-remapping processing) before actually generating write
> >> transaction for MSI interrupt ?
> >> My understanding is on x86, the MSI window is fixed and matches
> >> [FEE0_0000h – FEF0_000h]. MSIs are conveyed on a separate address
> >> space than usual DMA accesses. And yes they end up in IR if supported in
> the HW.
> >>
> >> Thanks
> >>
> >> Eric
> >>>
> >>> Thanks
> >>> -Bharat
> >>>
> >>>>
> >>>> Thanks,
> >>>> Jean

reply via email to

[Prev in Thread] Current Thread [Next in Thread]