Re: [Qemu-devel] VFIO based vGPU(was Re: [Announcement] 2015-Q3 release

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] VFIO based vGPU(was Re: [Announcement] 2015-Q3 release

From:	Alex Williamson
Subject:	Re: [Qemu-devel] VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...)
Date:	Tue, 26 Jan 2016 20:07:48 -0700

On Wed, 2016-01-27 at 09:47 +0800, Jike Song wrote:
> On 01/27/2016 06:56 AM, Alex Williamson wrote:
> > On Tue, 2016-01-26 at 22:39 +0000, Tian, Kevin wrote:
> > > > From: Alex Williamson [mailto:address@hidden
> > > > Sent: Wednesday, January 27, 2016 6:27 AM
> > > >  
> > > > On Tue, 2016-01-26 at 22:15 +0000, Tian, Kevin wrote:
> > > > > > From: Alex Williamson [mailto:address@hidden
> > > > > > Sent: Wednesday, January 27, 2016 6:08 AM
> > > > > >  
> > > > > > > > > >  
> > > > > > > > >  
> > > > > > > > > Today KVMGT (not using VFIO yet) registers I/O emulation 
> > > > > > > > > callbacks to
> > > > > > > > > KVM, so VM MMIO access will be forwarded to KVMGT directly for
> > > > > > > > > emulation in kernel. If we reuse above R/W flags, the whole 
> > > > > > > > > emulation
> > > > > > > > > path would be unnecessarily long with obvious performance 
> > > > > > > > > impact. We
> > > > > > > > > either need a new flag here to indicate in-kernel emulation 
> > > > > > > > > (bias from
> > > > > > > > > passthrough support), or just hide the region alternatively 
> > > > > > > > > (let KVMGT
> > > > > > > > > to handle I/O emulation itself like today).
> > > > > > > >  
> > > > > > > > That sounds like a future optimization TBH.  There's very strict
> > > > > > > > layering between vfio and kvm.  Physical device assignment 
> > > > > > > > could make
> > > > > > > > use of it as well, avoiding a round trip through userspace when 
> > > > > > > > an
> > > > > > > > ioread/write would do.  Userspace also needs to orchestrate 
> > > > > > > > those kinds
> > > > > > > > of accelerators, there might be cases where userspace wants to 
> > > > > > > > see those
> > > > > > > > transactions for debugging or manipulating the device.  We 
> > > > > > > > can't simply
> > > > > > > > take shortcuts to provide such direct access.  Thanks,
> > > > > > > >  
> > > > > > >  
> > > > > > > But we have to balance such debugging flexibility and acceptable 
> > > > > > > performance.
> > > > > > > To me the latter one is more important otherwise there'd be no 
> > > > > > > real usage
> > > > > > > around this technique, while for debugging there are other 
> > > > > > > alternative (e.g.
> > > > > > > ftrace) Consider some extreme case with 100k traps/second and 
> > > > > > > then see
> > > > > > > how much impact a 2-3x longer emulation path can bring...
> > > > > >  
> > > > > > Are you jumping to the conclusion that it cannot be done with proper
> > > > > > layering in place?  Performance is important, but it's not an 
> > > > > > excuse to
> > > > > > abandon designing interfaces between independent components.  
> > > > > > Thanks,
> > > > > >  
> > > > >  
> > > > > Two are not controversial. My point is to remove unnecessary long trip
> > > > > as possible. After another thought, yes we can reuse existing 
> > > > > read/write
> > > > > flags:
> > > > >       - KVMGT will expose a private control variable whether in-kernel
> > > > > delivery is required;
> > > >  
> > > > But in-kernel delivery is never *required*.  Wouldn't userspace want to
> > > > deliver in-kernel any time it possibly could?
> > > >  
> > > > >       - when the variable is true, KVMGT will register in-kernel MMIO
> > > > > emulation callbacks then VM MMIO request will be delivered to KVMGT
> > > > > directly;
> > > > >       - when the variable is false, KVMGT will not register anything.
> > > > > VM MMIO request will then be delivered to Qemu and then ioread/write
> > > > > will be used to finally reach KVMGT emulation logic;
> > > >  
> > > > No, that means the interface is entirely dependent on a backdoor through
> > > > KVM.  Why can't userspace (QEMU) do something like register an MMIO
> > > > region with KVM handled via a provided file descriptor and offset,
> > > > couldn't KVM then call the file ops without a kernel exit?  Thanks,
> > > >  
> > >  
> > > Could you elaborate this thought? If it can achieve the purpose w/o
> > > a kernel exit definitely we can adapt to it. :-)
> > 
> > I only thought of it when replying to the last email and have been doing
> > some research, but we already do quite a bit of synchronization through
> > file descriptors.  The kvm-vfio pseudo device uses a group file
> > descriptor to ensure a user has access to a group, allowing some degree
> > of interaction between modules.  Eventfds and irqfds already make use of
> > f_ops on file descriptors to poke data.  So, if KVM had information that
> > an MMIO region was backed by a file descriptor for which it already has
> > a reference via fdget() (and verified access rights and whatnot), then
> > it ought to be a simple matter to get to f_ops->read/write knowing the
> > base offset of that MMIO region.  Perhaps it could even simply use
> > __vfs_read/write().  Then we've got a proper reference to the file
> > descriptor for ownership purposes and we've transparently jumped across
> > modules without any implicit knowledge of the other end.  Could it work?
> 
> This is OK for KVMGT, from fops to vgpu device-model would always be simple.
> The only question is, how is KVM hypervisor supposed to get the fd on 
> VM-exitings?

Hi Jike,

Sorry, I don't understand "on VM-exiting".  KVM would hold a reference
to the fd via fdget(), so the vfio device wouldn't be closed until the
VM exits and KVM releases that reference.

> copy-and-paste the current implementation of vcpu_mmio_write(), seems
> nothing but GPA and len are provided:

I presume that an MMIO region is already registered with a GPA and
length, the additional information necessary would be a file descriptor
and offset into the file descriptor for the base of the MMIO space.

>       static int vcpu_mmio_write(struct kvm_vcpu *vcpu, gpa_t addr, int len,
>                                  const void *v)
>       {
>               int handled = 0;
>               int n;
> 
>               do {
>                       n = min(len, 8);
>                       if (!(vcpu->arch.apic &&
>                             !kvm_iodevice_write(vcpu, &vcpu->arch.apic->dev, 
>addr, n, v))
>                           && kvm_io_bus_write(vcpu, KVM_MMIO_BUS, addr, n, v))
>                               break;
>                       handled += n;
>                       addr += n;
>                       len -= n;
>                       v += n;
>               } while (len);
> 
>               return handled;
>       }
> 
> If we back a GPA range with a fd, this will also be a 'backdoor'?

KVM would simply be able to service the MMIO access using the provided
fd and offset.  It's not a back door because we will have created an API
for KVM to have a file descriptor and offset registered (by userspace)
to handle the access.  Also, KVM does not know the file descriptor is
handled by a VFIO device and VFIO doesn't know the read/write accesses
is initiated by KVM.  Seems like the question is whether we can fit
something like that into the existing KVM MMIO bus/device handlers
in-kernel.  Thanks,

Alex

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...), (continued)

Prev by Date: [Qemu-devel] [PATCH v2 14/14] hw/timer: QOM'ify tusb6010 and remove all tabs
Next by Date: Re: [Qemu-devel] [PATCH RFC 6/7] net/filter: Add a default filter to each netdev
Previous by thread: Re: [Qemu-devel] VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...)
Next by thread: Re: [Qemu-devel] VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...)
Index(es):
- Date
- Thread