qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device


From: Jean-Philippe Brucker
Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
Date: Fri, 7 Jul 2017 16:19:51 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.1.1

On 07/07/17 12:36, Bharat Bhushan wrote:
>>> In this proposal, QEMU reserves a iova-range for guest (not host) and guest
>> kernel will use this as msi-iova untranslated (IOMMU_RESV_MSI). While this
>> does not change host interface and it will continue to use host reserved
>> mapping for actual interrupt generation, no?
>> But then userspace needs to provide IOMMU_RESV_MSI range to guest
>> kernel, right? What would be the proposed manner?
> 
> Just an opinion, we can define feature (VIRTIO_IOMMU_F_RES_MSI_RANGE) and 
> provide this info via a command (VIRTIO_IOMMU_T_MSI_RANGE). Guest 
> iommu-driver will make this call during initialization and store the value. 
> This value will just replace MSI_IOVA_BASE and MSI_IOVA_LENGHT hash define. 
> Rest will remain same in virtio-iommu driver.

Yes I had something similar in mind, although more generic since we'll
need to get other bits of information from the device in future extensions
(fault handling, page table formats and dynamic reserves of memory for
SVM), and maybe also for finding out per-address-space page granularity
(see my reply of patch 3/8). These are per-endpoint properties that cannot
be advertise in the virtio config space.

                                 ***

So I propose to add a per-endpoint probing mechanism on the request queue:

* The device advertises a new command VIRTIO_IOMMU_T_PROBE with feature
bit VIRTIO_IOMMU_F_PROBE.
* When this feature is advertised, the device sets probe_size field in the
the config space.
* When device offers VIRTIO_IOMMU_F_PROBE, the driver should send an
VIRTIO_IOMMU_T_PROBE request for each new endpoint.
* The driver allocates a device-writeable buffer of probe_size (plus
framing) and sends it as a VIRTIO_IOMMU_T_PROBE request.
* The device fills the buffer with various information.

struct virtio_iommu_req_probe {
        /* device-readable */
        struct virtio_iommu_req_head head;
        le32 device;
        le32 flags;

        /* maybe also le32 content_size, but it must be equal to
           probe_size */

        /* device-writeable */
        u8 content[];
        struct virtio_iommu_req_tail tail;
};

I'm still struggling with the content and layout of the probe request, and
would appreciate any feedback. To be easily extended, I think it should
contain a list of fields of variable size:

        |0           15|16           31|32               N|
        |     type     |    length     |      values      |

'length' might be made optional if it can be deduced from type, but might
make driver-side parsing more robust.

The probe could either be done for each endpoint, or for each address
space. I much prefer endpoint because it is the smallest granularity. The
driver can then decide what endpoints to put together in the same address
space based on their individual capabilities. The specification would
described how each endpoint property is combined when endpoints are put in
the same address space. For example, take the minimum of all PASID size,
the maximum of all page granularities, combine doorbell addresses, etc.

If we did the probe on address spaces instead, the driver would have to
re-send a probe request each time a new endpoint is attached to an
existing address space, to see if it is still capable of page table
handover or if the driver just combined a VFIO and an emulated endpoint by
accident.

                                 ***

Using this framework, the device can declare doorbell regions by adding
one or more RESV fields into the probe buffer:

/* 'type' */
#define VIRTIO_IOMMU_PROBE_T_RESV       0x1

/* 'values'. 'length' is sizeof(struct virtio_iommu_probe_resv) */
struct virtio_iommu_probe_resv {
        le64 gpa;
        le64 size;

#define VIRTIO_IOMMU_PROBE_RESV_MSI     0x1
        u8 type;
};

Such a region would be subject to the following rules:

* Driver should not use any IOVA declared as RESV_MSI in a mapping.
* Device should leave any transaction matching a RESV_MSI region pass
through untranslated.
* If the device does not advertise any RESV region, then the driver should
assume that MSI doorbells, like any other GPA, must be mapped with an
arbitrary IOVA in order for the endpoint to access them.
* Given that the driver *should* perform a probe request if available, and
it *should* understand the VIRTIO_IOMMU_PROBE_T_RESV field, then this
field tells the guest how it should handle MSI doorbells, and whether it
should map the address via MAP requests or not.

Does this make sense and did I overlook something?

Thanks,
Jean



reply via email to

[Prev in Thread] Current Thread [Next in Thread]