[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v7 3/3] sPAPR: EEH support for VFIO PCI device

From: Gavin Shan
Subject: Re: [Qemu-devel] [PATCH v7 3/3] sPAPR: EEH support for VFIO PCI device
Date: Wed, 28 May 2014 22:33:14 +1000
User-agent: Mutt/1.5.21 (2010-09-15)

On Wed, May 28, 2014 at 01:24:15PM +0200, Alexander Graf wrote:
>On 28.05.14 06:12, Gavin Shan wrote:
>>On Wed, May 28, 2014 at 12:41:37AM +0200, Alexander Graf wrote:
>>>On 28.05.14 00:27, Benjamin Herrenschmidt wrote:
>>>>On Wed, 2014-05-28 at 00:22 +0200, Alexander Graf wrote:
>>>>In any case, the above isn't the problem, we register rtas functions
>>>>called rtas_ibm_*, that's fine.
>>>They're called rtas_ibm, not rtas_vfio. They simply live in the wrong
>>>file for starters. Once you start moving them out of *vfio*.c and
>>>make sure you don't sneak in vfio headers into spapr_rtas.c all the
>>>abstraction should come naturally.
>>>>The problem you have is with what's inside these functions, correct ?
>>>>You want some kind of PCI Dev ops... am I understanding properly ?
>>>>Instead of having them call the VFIO ioctl's directly.
>>>We have a nice bus hierarchy. The RTAS call tells you the PHB's ID.
>>>The machine knows about all of its PHBs, so you can find that one.
>>>From the PHB we can then find a PCIDevice by identifier if we
>>>received one in the RTAS call.
>>>If we received a PE, we need to act on a virtual PE object (not
>>>available yet I think) which then would call at least into each VFIO
>>>container associated with that guest PE. We don't want to call into
>>>devices here, since we could have 2 VFIO devices in one PE and don't
>>>want to do a PE level operation twice.
>>>If we want to simplify things (and I'm fine with that), we can also
>>>restrict a "guest PE" to only span at most a single VFIO container
>>>context and unlimited emulated devices. That way we don't need to
>>>worry about potential side effects.
>>>I can't tell you what the "guest PE" would best be modeled at. Maybe
>>>a simple list/hash inside the PHB is the answer. But we need to
>>>ensure that the guest knows about its scope of action. And we need to
>>>make sure that we maintain abstraction levels to keep at least a
>>>minimum level of sanity for the reader.
>>sPAPRVFIOPHBState has one-to-one mapping with container, which has
>>one-to-one mapping relationship with IOMMU group on sPAPR platform.
>>So sPAPRVFIOPHBState is representing "guest PE".
>So each PHB only spans a single PE?


>>Yes. RTAS calls from guest, 2 types of address information as input there:
>>PHB BUID+device's PCI config address, or PHB BUID+PE address. After removing
>>the logic of translating device's PCI config address to PE address from
>>host to QEMU, we don't have to care much about the first case (PHB BUID+
>>device's PCI config address). That means we only have to care "PE address",
>>which is basically the IOMMU group ID (plus fixed offset) of 
>>So the BUID can be used to locate sPAPRPHBVFIOState and "PE address" can
>>help doing same thing.
>... then we can just ignore the "PE address", no?

Yes, we can ignore that. But I plan to use it for more sanity check, which
isn't harmful.

>>As you mentioned above, spapr_rtas.c can't include "vfio.h". I guess I have
>>to rework on PCIDevice/PCIDeviceClass for a bit as follows. Alex, could you
>>help confirming it's what you're expecting to see?
>I depends. If you need a PCI device level callback, then yes. If you
>don't need a PCI level callback, just have one in the PHB. I'm not
>quite sure how you can find the "VFIO container" that belongs to that
>PHB though, but I'm sure you'll figure that out :).

Yeah, I'll have PHB level callback after having a discussion with
Alexey who gave good idea to make the abstraction as follows. Please
let me know if I'm in wrong direction again because I don't have QEMU
experience at all:

        RTAS call -> hw/ppc/spapr_pci.c where RTAS calls get registered
                  -> sPAPRPHBClass::rtas_eeh_handler() in 
                  -> vfio_container_ioctl() in hw/misc/vfio.c

If you think it's right way to go, I'm going to change the code accordingly
and send it to you for review after Alexey's quick scan. That would save
you a bit time in future though it already took you much time.

>>- In include/hw/pci/pci.h, to have following callback for PCIDeviceClass
>>   /* It's basically mapping ioctl commands to another set of macros */
>>   #define PCI_DEV_EEH_OP_ENABLE    0
>>   #define PCI_DEV_EEH_OP_DISABLE   1
>>   #define PCI_DEV_EEH_OP_GET_STATE 2
>>   #define PCI_DEV_EEH_OP_PE_RESET  3
>>   #define PCI_DEV_EEH_OPT_HRSET     0
>>   #define PCI_DEV_EEH_OPT_FRESET    1
>>           :
>>   int (*eeh_handler)(PCIDevice *pdev, int opcode, int option);
>If we need a PCI device level callback, yes.
>>- In hw/ppc/spapr_pci_vfio.c, to register EEH RTAS calls there and I
>>   shouldn't include linux-headers/vfio.h there. When EEH RTAS calls
>>   come in, find the PE according to the PE address and pick up any
>>   one of the included PCI device and call into eeh_handler(). If the
>>   input address is device's PCI config address, find the PCI device
>>   and call to its eeh_handler() callback.
>Uh. The RTAS flow should go through a few different files that
>basically represent the different layers on a real machine.
>The spapr_rtas.c file should register the RTAS calls, as that one
>emulates the RTAS code that would usually run in guest code. That
>code determines and calls into the PHB.
>The PHB is implemented in hw/ppc/spapr_pci.c, so that's where the EEH
>calls on the PHB level belong. If the target of a call is a PCI
>device, the PHB looks up the device, its class and calls into an EEH
>call inside the PCI device. If the target is a PE, the PHB needs to
>call out on the TCE entity representing the PE. If that is backed by
>a VFIO IOMMU, it should get called somehow.

Ah, It's almost same to the above idea that Alexey shared. The different
point is that I'm going to have PHB level callback as sPAPRVFIOPHBState
could be regarded as "PE" :-)


reply via email to

[Prev in Thread] Current Thread [Next in Thread]