[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v7 3/3] sPAPR: EEH support for VFIO PCI device

From: Alexander Graf
Subject: Re: [Qemu-devel] [PATCH v7 3/3] sPAPR: EEH support for VFIO PCI device
Date: Wed, 28 May 2014 13:24:15 +0200
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:24.0) Gecko/20100101 Thunderbird/24.5.0

On 28.05.14 06:12, Gavin Shan wrote:
On Wed, May 28, 2014 at 12:41:37AM +0200, Alexander Graf wrote:
On 28.05.14 00:27, Benjamin Herrenschmidt wrote:
On Wed, 2014-05-28 at 00:22 +0200, Alexander Graf wrote:

In any case, the above isn't the problem, we register rtas functions
called rtas_ibm_*, that's fine.
They're called rtas_ibm, not rtas_vfio. They simply live in the wrong
file for starters. Once you start moving them out of *vfio*.c and
make sure you don't sneak in vfio headers into spapr_rtas.c all the
abstraction should come naturally.

The problem you have is with what's inside these functions, correct ?
You want some kind of PCI Dev ops... am I understanding properly ?
Instead of having them call the VFIO ioctl's directly.
We have a nice bus hierarchy. The RTAS call tells you the PHB's ID.
The machine knows about all of its PHBs, so you can find that one.

>From the PHB we can then find a PCIDevice by identifier if we
received one in the RTAS call.

If we received a PE, we need to act on a virtual PE object (not
available yet I think) which then would call at least into each VFIO
container associated with that guest PE. We don't want to call into
devices here, since we could have 2 VFIO devices in one PE and don't
want to do a PE level operation twice.

If we want to simplify things (and I'm fine with that), we can also
restrict a "guest PE" to only span at most a single VFIO container
context and unlimited emulated devices. That way we don't need to
worry about potential side effects.

I can't tell you what the "guest PE" would best be modeled at. Maybe
a simple list/hash inside the PHB is the answer. But we need to
ensure that the guest knows about its scope of action. And we need to
make sure that we maintain abstraction levels to keep at least a
minimum level of sanity for the reader.

sPAPRVFIOPHBState has one-to-one mapping with container, which has
one-to-one mapping relationship with IOMMU group on sPAPR platform.
So sPAPRVFIOPHBState is representing "guest PE".

So each PHB only spans a single PE?

Yes. RTAS calls from guest, 2 types of address information as input there:
PHB BUID+device's PCI config address, or PHB BUID+PE address. After removing
the logic of translating device's PCI config address to PE address from
host to QEMU, we don't have to care much about the first case (PHB BUID+
device's PCI config address). That means we only have to care "PE address",
which is basically the IOMMU group ID (plus fixed offset) of sPAPRPHBVFIOState.
So the BUID can be used to locate sPAPRPHBVFIOState and "PE address" can
help doing same thing.

... then we can just ignore the "PE address", no?

As you mentioned above, spapr_rtas.c can't include "vfio.h". I guess I have
to rework on PCIDevice/PCIDeviceClass for a bit as follows. Alex, could you
help confirming it's what you're expecting to see?

I depends. If you need a PCI device level callback, then yes. If you don't need a PCI level callback, just have one in the PHB. I'm not quite sure how you can find the "VFIO container" that belongs to that PHB though, but I'm sure you'll figure that out :).

- In include/hw/pci/pci.h, to have following callback for PCIDeviceClass

   /* It's basically mapping ioctl commands to another set of macros */
   #define PCI_DEV_EEH_OP_ENABLE    0
   #define PCI_DEV_EEH_OP_DISABLE   1
   #define PCI_DEV_EEH_OP_PE_RESET  3

   #define PCI_DEV_EEH_OPT_HRSET     0
   #define PCI_DEV_EEH_OPT_FRESET    1
   int (*eeh_handler)(PCIDevice *pdev, int opcode, int option);

If we need a PCI device level callback, yes.

- In hw/ppc/spapr_pci_vfio.c, to register EEH RTAS calls there and I
   shouldn't include linux-headers/vfio.h there. When EEH RTAS calls
   come in, find the PE according to the PE address and pick up any
   one of the included PCI device and call into eeh_handler(). If the
   input address is device's PCI config address, find the PCI device
   and call to its eeh_handler() callback.

Uh. The RTAS flow should go through a few different files that basically represent the different layers on a real machine.

The spapr_rtas.c file should register the RTAS calls, as that one emulates the RTAS code that would usually run in guest code. That code determines and calls into the PHB.

The PHB is implemented in hw/ppc/spapr_pci.c, so that's where the EEH calls on the PHB level belong. If the target of a call is a PCI device, the PHB looks up the device, its class and calls into an EEH call inside the PCI device. If the target is a PE, the PHB needs to call out on the TCE entity representing the PE. If that is backed by a VFIO IOMMU, it should get called somehow.


- eeh_handler() should point to some function implemented in hw/misc/vfio.c.
   Lets say "vfio_eeh_handler()", where in turns call to container's ioctl().
   I probably have to move the ioctl commands from container to VFIO PCI device
   fd as I did in previous patch.

- eeh_handler() should be NULL for emulated PCI device and just return error
   for them.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]