qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] virtio-pci: Set the QEMU_PCI_CAP_EXPRESS capabi


From: Marcel Apfelbaum
Subject: Re: [Qemu-devel] [PATCH] virtio-pci: Set the QEMU_PCI_CAP_EXPRESS capability early in its DeviceClass realize method
Date: Wed, 2 Dec 2015 11:51:46 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0

On 12/02/2015 10:01 AM, Shmulik Ladkani wrote:
Thanks Marcel,

On Tue, 1 Dec 2015 22:46:33 +0200, address@hidden wrote:
The reason is the device becomes express only if *all* the conditions
are met.

I'm ok with either approaches.

However it seems common practice to set QEMU_PCI_CAP_EXPRESS
unconditionally for PCIE devices.

The few existing PCIE devices do so by assigning their
PCIDeviceClass.is_express to 1 within their 'class_init', regardless the
properties of the bus their on.
(e.g. xhci_class_init, megasas_class_init, vfio_pci_dev_class_init,
   nvme_class_init, and more)

Some devices later call pcie_endpoint_cap_init conditionally.
(e.g. usb_xhci_realize).

Can you please examine this and let me know the preferred approach?

Yes, I saw that..., as always not a walk in the park.

- So we have "is_express = true" <=> QEMU_PCI_CAP_EXPRESS on <=> "config size = 
PCIe"
- Not related to the above (!!), if (some condition) => add PCIe express 
capability
    (megasas is the exception)

Let's take "usb_xhci":
   - If we put it under a PCI bus it will not be an express device, but
     it will have a "big" config space. Also pci_is_express(dev) will still 
return true!
   - This is probably a bug. (or I am missing something)

I actually assumed this is the right behavior.

A device class reports whether its instances *could* be pcie by arming
its PCIDeviceClass.is_express.
As such, the "big" config space is allocated for the instance. This is
harmless.

Such a device may (or may not) be connected to a pcie bus, and only if
so, we report it is a pcie endpoint.

Also, pcie_add_capability is allowed on that device, in order to setup
whatever capabilities on its pcie config space (even if finally not on a
pcie bus).

Moreover, VMSTATE_PCIE_DEVICE (which uses vmstate_pcie_device rather
than vmstate_pci_device) can be used for that device's
VMStateDescription fields *without* worrying whether the actual config
space is "big" or "small".
Otherwise one should examine whether vmstate_pcie_device or
vmstate_pci_device need to be used. Seems tedious.

This is the reasoning I can think of, why assigning QEMU_PCI_CAP_EXPRESS
and reporting pcie_endpoint_cap_init are not tightly coupled.

I agree it may be the reason, but that does not make it right.
I still see two *possible* problems:
1. Pci config space is guest visible. The guest can read/write to
   a place it shouldn't. I don't know if is a *real* issue, but it needs 
checking.
2. We still have pci_is_express returning true, this is error prone because one
   can use this function assuming the device is express. Maybe we should call it 
"can_be_express" ?
If the migration construct (VMSTATE) is *the only* reason for doing this, maybe 
is not a good
enough reason (I am not the one to decide :)  ). Is still it seems a little off 
to me.

If you think this is good enough, you can simply do the same:
  - Instead of replacing the realize method, just advertise it with 
"is_express" (meaning it can be express)
  - Leave all the conditions as they were in prev patch.
As a result, the pci config space will have the right length.
The consequences are obvious now, if virtio/pci maintainers are OK with that, 
so am I.

Thanks,
Marcel





Indeed, no strict solution here, both approaches seem reasoanble (and
both are used!).

WDYT? Is my above interpretation makes sense?

Regards,
Shmulik





reply via email to

[Prev in Thread] Current Thread [Next in Thread]