qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC 3/3] vhost-user: support programming VFIO group in


From: Alex Williamson
Subject: Re: [Qemu-devel] [RFC 3/3] vhost-user: support programming VFIO group in master
Date: Mon, 30 Jul 2018 10:20:29 -0600

On Mon, 30 Jul 2018 12:30:58 +0300
"Michael S. Tsirkin" <address@hidden> wrote:

> On Mon, Jul 30, 2018 at 04:10:04PM +0800, Tiwei Bie wrote:
> > On Fri, Jul 27, 2018 at 02:03:00PM -0600, Alex Williamson wrote:  
> > > On Fri, 27 Jul 2018 09:58:05 +0800
> > > Tiwei Bie <address@hidden> wrote:
> > >   
> > > > On Thu, Jul 26, 2018 at 02:45:39PM -0600, Alex Williamson wrote:  
> > > > > On Mon, 23 Jul 2018 12:59:56 +0800
> > > > > Tiwei Bie <address@hidden> wrote:
> > > > >     
> > > > [...]  
> > > > > >  
> > > > > > +static int vhost_user_slave_handle_vfio_group(struct vhost_dev 
> > > > > > *dev,
> > > > > > +                                              int *fd)
> > > > > > +{
> > > > > > +    struct vhost_user *u = dev->opaque;
> > > > > > +    VhostUserState *user = u->user;
> > > > > > +    VirtIODevice *vdev = dev->vdev;
> > > > > > +    int groupfd = fd[0];
> > > > > > +    VFIOGroup *group;
> > > > > > +
> > > > > > +    if (!virtio_has_feature(dev->protocol_features,
> > > > > > +                            VHOST_USER_PROTOCOL_F_VFIO_GROUP) ||
> > > > > > +        vdev == NULL) {
> > > > > > +        return -1;
> > > > > > +    }
> > > > > > +
> > > > > > +    if (user->vfio_group) {
> > > > > > +        vfio_put_group(user->vfio_group);
> > > > > > +        user->vfio_group = NULL;
> > > > > > +    }
> > > > > > +    
> > > > 
> > > > There should be a below check here (I missed it in this
> > > > patch, sorry..):
> > > > 
> > > >     if (groupfd < 0) {
> > > >         return 0;
> > > >     }
> > > >   
> > > > > > +    group = vfio_get_group_from_fd(groupfd, vdev->dma_as, NULL);
> > > > > > +    if (group == NULL) {
> > > > > > +        return -1;
> > > > > > +    }
> > > > > > +
> > > > > > +    if (group->fd != groupfd) {
> > > > > > +        close(groupfd);
> > > > > > +    }
> > > > > > +
> > > > > > +    user->vfio_group = group;
> > > > > > +    fd[0] = -1;
> > > > > > +
> > > > > > +    return 0;
> > > > > > +}    
> > > > > 
> > > > > This all looks very sketchy, we're reaching into vfio internal state
> > > > > and arbitrarily releasing data structures, reusing existing ones, and
> > > > > maybe creating new ones.  We know that a vfio group only allows a
> > > > > single open, right?    
> > > > 
> > > > Yeah, exactly!
> > > > 
> > > > When the vhost-user backend process has opened a VFIO group fd,
> > > > QEMU won't be able to open this group file via open() any more.
> > > > So vhost-user backend process will share the VFIO group fd to
> > > > QEMU via the UNIX socket. And that's why we're introducing a
> > > > new API:
> > > > 
> > > >         vfio_get_group_from_fd(groupfd, ...);
> > > > 
> > > > for vfio/common.c to get (setup) a VFIOGroup structure. (Because
> > > > in this case, vfio/common.c can't open this group file via open(),
> > > > and what we have is a VFIO group fd shared by another process via
> > > > UNIX socket). And ...
> > > >   
> > > > > So if you have a groupfd, vfio will never have
> > > > > that same group opened already.    
> > > > 
> > > > ... if the same vhost-user backend process shares the same VFIO
> > > > group fd more than one times via different vhost-user slave channels
> > > > to different vhost-user frontends in the same QEMU process, the
> > > > same VFIOGroup could exist already.
> > > > 
> > > > This case could happen when e.g. there are two vhost accelerator
> > > > devices in the same VFIO group, and they are managed by the same
> > > > vhost-user backend process, and used to accelerate two different
> > > > virtio devices in QEMU. In this case, the same VFIO group fd could
> > > > be sent twice.
> > > > 
> > > > If we need to support this case, more changes in vfio/common.c
> > > > will be needed, e.g. 1) add a refcnt in VFIOGroup to support
> > > > getting and putting a VFIOGroup structure multiple times,
> > > > 2) add a lock to make vfio_get_group*() and vfio_put_group()
> > > > thread-safe.  
> > > 
> > > This is the sort of case where it seems like we're just grabbing
> > > internal interfaces and using them from other modules.  VFIOGroups have
> > > a device_list and currently handles multiple puts by testing whether
> > > that device list is empty (ie. we already have a refcnt effectively).
> > > Now we have a group user that has no devices, which is not something
> > > that it seems like we've designed or audited the code for handling.
> > > IOW, while the header file lives in a common location, it's not really
> > > intended to be an API within QEMU and it makes me a bit uncomfortable
> > > to use it as such.
> > >   
> > > > > Is that the goal?  It's not obvious in
> > > > > the code.  I don't really understand what vhost goes on to do with 
> > > > > this
> > > > > group,    
> > > > 
> > > > The goal is that, we have a VFIO group opened by an external
> > > > vhost-user backend process, this process will manage the VFIO
> > > > device, but want QEMU to manage the VFIO group, including:
> > > > 
> > > > 1 - program the DMA mappings for this VFIO group based on
> > > >     the front-end device's dma_as in QEMU.
> > > > 
> > > > 2 - add this VFIO group to KVM (KVM_DEV_VFIO_GROUP_ADD).
> > > > 
> > > > Vhost-user in QEMU as the vhost-user frontend just wants
> > > > hw/vfio to do above things after receiving a VFIO group fd
> > > > from the vhost-user backend process.
> > > > 
> > > > Vhost-user will just ask hw/vfio to get (setup) a VFIOGroup
> > > > by calling vfio_get_group_from_fd() or put this VFIOGroup by
> > > > calling vfio_put_group() when it's not needed anymore, and
> > > > won't do anything else.
> > > > 
> > > > Vhost-user will only deal with the VFIOGroups allocated by
> > > > itself, and won't influence any other VFIOGroups used by the
> > > > VFIO passthru devices (-device vfio-pci). Because the same
> > > > VFIO group file can only be opened by one process via open().  
> > > 
> > > But there's nothing here that guarantees the reverse.  vhost cannot
> > > open the group of an existing vfio-pci device, but vfio-pci can re-use
> > > the group that vhost has already opened.  This is again where it seems
> > > like we're trying to cherry pick from an internal API and leaving some
> > > loose ends dangling.
> > >   
> > > > > but I'm pretty sure the existing state machine in vfio isn't
> > > > > designed for it.  Thanks,    
> > > > 
> > > > Yeah, more changes (e.g. refcnt, ..) in hw/vfio are needed to
> > > > make it work robustly. Hmm.. Is above idea (e.g. how vhost-user
> > > > is going to use VFIOGroup and how we are going to extend hw/vfio)
> > > > acceptable to you? Thanks!  
> > > 
> > > I certainly would not want to duplicate managing the group, container,
> > > and MemoryListener, but I think we need a more formal interface with at
> > > least some notion of reference counting beyond the device list or
> > > explicit knowledge of an external user.  
> > 
> > Got it! I'll try to address this. Thanks!
> >   
> > > I'm also curious if it really
> > > makes sense that the vhost backend is opening the group rather than
> > > letting QEMU do it.  It seems that vhost wants to deal with the device
> > > and we can never have symmetry in the scenario above, where a group
> > > might contain multiple devices and order matters because of where the
> > > group is opened.  If we still had a notion of a VFIODevice for an
> > > external user, then the device_list refcnt would just work.  Thanks,  
> > 
> > Vhost-user backend will but QEMU won't open the VFIO
> > fds, because QEMU just has a vhost-user UNIX socket
> > path that it should connect to or listen on. So QEMU
> > doesn't know which group and device it should open.
> > The vhost accelerator devices are probed and managed
> > in the vhost-user backend process. And the vhost-user
> > backend process will bind the vhost-user instances
> > and vhost accelerator devices based on some sort of
> > user's input.
> > 
> > Best regards,
> > Tiwei Bie  
> 
> Is the reason it's done like this because the
> backend is sharing the device with multiple QEMUs?

Each QEMU instance that gets passed a vfio group would attempt to add
it to a vfio container (ie. IOMMU domain), so this mechanism cannot
support multiple QEMU instances attached to the same group.  Beyond
that, the address space of each device would need to be unique within
the instance as we only have a single IOVA space for the group.  Thanks,

Alex



reply via email to

[Prev in Thread] Current Thread [Next in Thread]