qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] VFIO and scheduled SR-IOV cards


From: Alex Williamson
Subject: Re: [Qemu-devel] VFIO and scheduled SR-IOV cards
Date: Mon, 03 Jun 2013 12:57:45 -0600

On Mon, 2013-06-03 at 14:34 -0400, Don Dutile wrote:
> On 06/03/2013 02:02 PM, Alex Williamson wrote:
> > On Mon, 2013-06-03 at 18:33 +0200, Benoît Canet wrote:
> >> Hello,
> >>
> >> I plan to write a PF driver for an SR-IOV card and make the VFs work with 
> >> QEMU's
> >> VFIO passthrough so I am asking the following design question before 
> >> trying to
> >> write and push code.
> >>
> >> After SR-IOV being enabled on this hardware only one VF function can be 
> >> active
> >> at a given time.
> >
> > Is this actually an SR-IOV device or are you trying to write a driver
> > that emulates SR-IOV for a PF?
> >
> >> The PF host kernel driver is acting as a scheduler.
> >> It switch every few milliseconds which VF is the current active function 
> >> while
> >> disabling the others VFs.
> >>
> that's time-sharing of hw, which sw doesn't see ... so, ok.
> 
> >> One consequence of how the hardware works is that the MMR regions of the
> >> switched off VFs must be unmapped and their io access should block until 
> >> the VF
> >> is switched on again.
> >
> This violates the spec., and does impact sw -- how can one assign such a VF 
> to a guest
> -- it does not work indep. of other VFs.
> 
> > MMR = Memory Mapped Register?
> >
> > This seems contradictory to the SR-IOV spec, which states:
> >
> >          Each VF contains a non-shared set of physical resources required
> >          to deliver Function-specific
> >          services, e.g., resources such as work queues, data buffers,
> >          etc. These resources can be directly
> >          accessed by an SI without requiring VI or SR-PCIM intervention.
> >
> > Furthermore, each VF should have a separate requester ID.  What's being
> > suggested here seems like maybe that's not the case.  If true, it would
> I didn't read it that way above.  I read it as the PCIe end is timeshared
> btwn VFs (& PFs?). .... with some VFs disappearing (from a driver perspective)
> as if the device was hot unplug w/o notification.  That will probably cause
> read-timeouts & SME's, bringing down most enterprise-level systems.

Perhaps I'm reading too much into it, but using the same requester ID
would seem like justification for why the device needs to be unmapped.
Otherwise we could just stop QEMU and leave the mappings alone if we
just want to make sure access to the device is blocked while the device
is swapped out.  Not the best overall throughput algorithm, but maybe a
proof of concept.  Need more info about how the device actually behaves
to know for sure.  Thanks,

Alex

> > make iommu groups challenging.  Is there any VF save/restore around the
> > scheduling?
> >
> >> Each IOMMU map/unmap should be done in less than 100ns.
> >
> > I think that may be a lot to ask if we need to unmap the regions in the
> > guest and in the iommu.  If the "VFs" used different requester IDs,
> > iommu unmapping whouldn't be necessary.  I experimented with switching
> > between trapped (read/write) access to memory regions and mmap'd (direct
> > mapping) for handling legacy interrupts.  There was a noticeable
> > performance penalty switching per interrupt.
> >
> >> As the kernel iommu module is being called by the VFIO driver the PF driver
> >> cannot interface with it.
> >>
> >> Currently the only interface of the VFIO code is for the userland QEMU 
> >> process
> >> and I fear that notifying QEMU that it should do the unmap/block would 
> >> take more
> >> than 100ns.
> >>
> >> Also blocking the IO access in QEMU under the BQL would freeze QEMU.
> >>
> >> Do you have and idea on how to write this required map and block/unmap 
> >> feature ?
> >
> > It seems like there are several options, but I'm doubtful that any of
> > them will meet 100ns.  If this is completely fake SR-IOV and there's not
> > a different requester ID per VF, I'd start with seeing if you can even
> > do the iommu_unmap/iommu_map of the MMIO BARs in under 100ns.  If that's
> > close to your limit, then your only real option for QEMU is to freeze
> > it, which still involves getting multiple (maybe many) vCPUs out of VM
> > mode.  That's not free either.  If by some miracle you have time to
> > spare, you could remap the regions to trapped mode and let the vCPUs run
> > while vfio blocks on read/write.
> >
> > Maybe there's even a question whether mmap'd mode is worthwhile for this
> > device.  Trapping every read/write is orders of magnitude slower, but
> > allows you to handle the "wait for VF" on the kernel side.
> >
> > If you can provide more info on the device design/contraints, maybe we
> > can come up with better options.  Thanks,
> >
> > Alex
> >
> > _______________________________________________
> > iommu mailing list
> > address@hidden
> > https://lists.linuxfoundation.org/mailman/listinfo/iommu
> 






reply via email to

[Prev in Thread] Current Thread [Next in Thread]