qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 2/2] hw/pcie: disable IO port fwd by default for


From: Laszlo Ersek
Subject: Re: [Qemu-devel] [PATCH 2/2] hw/pcie: disable IO port fwd by default for pcie-root-port
Date: Wed, 20 Sep 2017 13:01:23 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0

On 09/20/17 09:42, Marcel Apfelbaum wrote:
> On 20/09/2017 1:15, Laszlo Ersek wrote:

>> It seems that we now have "grp->io_reserve" (a numeric quantity, not a
>> boolean), from property "io-reserve". The default value is -1.
>>
>> According to the documentation added in c1800a162765 ("docs: update
>> documentation considering PCIE-PCI bridge", 2017-08-18), the value -1
>> seems to imply, "If any reservation field is -1 then this kind of
>> reservation is not needed and must be ignored by firmware".
>>
> 
> This was decided after long long upstream discussions with
> different maintainers.

I know :) I participated in the review, and the documentation patch
noted above carries my R-b.

I didn't mean that the -1 default was wrong, only that we might have to
polish the definition / semantics a bit.

>> I think we'll need to refine the definition. Once OVMF starts processing
>> this capability, the behavior should be compatible with earlier
>> behavior. Assume that a user sets only "mem-reserve" to something
>> different from -1, and thus the capability appears. When OVMF sees the
>> capability, it should be able to tell apart:
>>
>> - no particular hint about IO space, so continue doing whatever has been
>> done all this time (default IO space reservation),
>> - do not request IO space reservation at all, > - a given (positive)
>> size of IO space should be reserved.
>>
>> So I think:
>>
>> (a) the above read-only mask setting should be done based on
>>
>>    (grp->io_reserve == 0)
>>
>> and the "enable-io-fwd" property should be unnecessary,
>>
> 
> I really want this to be the case,  but I need to check the
> implementation first. The only concern is the semantics.
> 
> (grp->io_reserve == 0) hints the firmware to take
> max(hint, <default-value>), since we don't want to reserve less
> than the IO range needed by existing devices behind the bridge.

I agree.

In OVMF, I would translate (io_reserve == 0) into *not* returning any
particular IO reservation hint to the generic PciBusDxe driver, for the
PCI-E port in question. In turn PciBusDxe follows the above "max"
semantics, as far as I remember; in other words, setting io_reserve to
zero wouldn't break existing IO requirements.

Here's a table:

- io_reserve == (-1) --> OVMF returns an IO reservation of 512 bytes
  (which in practice gets rounded up to 4KB). If more IO is needed by
  devices behind the bridge, then that larger value is used. This is the
  current behavior and we should keep it.

- io_reserve == 0 --> OVMF returns no IO reservation at all. If any IO
  is needed by devices behind the bridge, then that value is used.
  Otherwise, no IO is allocated.

- io_reserve > 0 --> OVMF returns this value to PciBusDxe. PciBusDxe
  might round it up. If more IO is needed by devices behind the bridge,
  then that larger value is used.

> The implementation issue might be that Guest Firmware would
> take the alignment as default value for an empty root port.
> (maybe take it as a bug and "solve" it?)

Hm, I don't get this. What do you mean?

> 
>> (b) the "io-reserve" property should be set to 0 for 2.11 machine types
>> and onward, and to -1 for 2.10 and earlier (for compatibility),
>>
> 
> Michael asked us to wait a little before changing the default,
> you can ask him for more info :)

Sure, I'm totally fine with delaying the change to the default value.

> 
>> (c) the documentation in "docs/pcie_pci_bridge.txt" should be updated to
>> say:
>> * (-1) --> default firmware behavior (unspecified)
>> *   0  --> do not reserve
>> *  >0  --> specific reservation requested
>>
> 
> Agreed, once I re-check SeaBIOS to confirm behavior.

BTW, if the OVMF -- more precisely, PciBusDxe -- and SeaBIOS behaviors
turn out to differ significantly in the handling of the values (although
I don't expect it, see above -- last time I looked, the interpretations
were similar), I think it's a possibility (although not optimal) to say
that the interpretation of these values is firmware specific.

We won't know for certain until we write the code and test both firmwares.

> 
>> (d) pci_bridge_qemu_reserve_cap_init() should be updated accordingly
>> (i.e., a conflict exists if both mem_pref_32_reserve and
>> mem_pref_64_reserve are *positive*),
>>
> 
> I thought we have this check already.

We have a check like this, but it doesn't use the exact condition that
I'm suggesting above (emphasis on *positive*). We now have

    if (mem_pref_32_reserve != (uint32_t)-1 &&
        mem_pref_64_reserve != (uint64_t)-1) {
        error_setg(errp,

but after introducing the zero value, this is going to be too strict.
Consider all the possible combinations:

* (-1; -1): default fw behavior, check is OK

* (-1;  0): fw sticks with the default for the first component, picks
            "no reservation" for the second component, check is OK

* (-1; >0): fw can treat this identically to ( 0; >0) -- see below. This
            is justified because, while the first component says "use
            the default", the entire situation of having such a
            capability is new, so the firmware can introduce new ways
            for handling it. The check passes, which is good.

* ( 0;  0): check reports error, but the firmware can handle this case
            fine (no reservation for either component)

* ( 0; >0): check reports error, but the fw can handle this (no
            reservation for the first component, specific reservation
            for the second)

* (>0; >0): conflict, check is OK

(I'm skipping the description of the inverted orderings, such as (0;-1),
they behave similarly.)

So we need to accept 0 as "valid" for either component:

    if (mem_pref_32_reserve > 0 &&
        mem_pref_32_reserve < (uint32_t)-1 &&
        mem_pref_64_reserve > 0 &&
        mem_pref_64_reserve < (uint64_t)-1) {
        error_setg(errp,

> 
>>
>> Second, when determining the reservations in OVMF, I would like to look
>> only at the capability fields, and not do a read-write-read-write
>> quadruplet to the IO base/limit registers. Do you agree?
>>
> 
> Well, as stated before, the semantics for the hint is
> max(hint, <sum of all IO/MEM ranges for devices behind the bridge>).
> If you can follow it, that would be OK.

In OVMF there are two separate questions:
- how to report the requested reservations,
- how to act upon the reported values.

The first question pertains to "platform code", i.e., what I'm going to
implement under OvmfPkg. The second question pertains to "universal
code" (under "MdeModulePkg/Bus/Pci/PciBusDxe"), which is a lot harder to
modify, because it is shipped on a wide range of physical platforms too.

Again, my understanding is that PciBusDxe implements the "max" semantics
that you describe (pertaining to the 2nd question).

My interest is in figuring out the 1st question now, that is, where I
should take the information from that goes into the "reservation
description" that PciBusDxe understands. My preference would be to look
only at the PCIE port in question (i.e. no other PCI devices at all),
and only at said capability in the config space of the PCIE port.

> Getting back to this patch, setting io-reserve to 0
> would require also:
> 
>    +      pci_word_test_and_clear_mask(pci_dev->wmask + PCI_COMMAND,
>    +                                     PCI_COMMAND_IO);
>    +      pci_dev->wmask[PCI_IO_BASE] = 0;
>    +      pci_dev->wmask[PCI_IO_LIMIT] = 0;
> 
> otherwise the Guest OS would not behave, just be aware of it.

Absolutely, no doubt about this.

Thanks,
Laszlo


>>>   static const VMStateDescription vmstate_rp_dev = {
>>>       .name = "pcie-root-port",
>>>       .version_id = 1,
>>> @@ -78,6 +111,7 @@ static const VMStateDescription vmstate_rp_dev = {
>>>     static Property gen_rp_props[] = {
>>>       DEFINE_PROP_BOOL("x-migrate-msix", GenPCIERootPort,
>>> migrate_msix, true),
>>> +    DEFINE_PROP_BOOL("enable-io-fwd", GenPCIERootPort,
>>> enable_io_fwd, false),
>>>       DEFINE_PROP_END_OF_LIST()
>>>   };
>>>   @@ -86,6 +120,7 @@ static void gen_rp_dev_class_init(ObjectClass
>>> *klass, void *data)
>>>       DeviceClass *dc = DEVICE_CLASS(klass);
>>>       PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
>>>       PCIERootPortClass *rpc = PCIE_ROOT_PORT_CLASS(klass);
>>> +    GenPCIERootPortClass *grpc = GEN_PCIE_ROOT_PORT_CLASS(klass);
>>>         k->vendor_id = PCI_VENDOR_ID_REDHAT;
>>>       k->device_id = PCI_DEVICE_ID_REDHAT_PCIE_RP;
>>> @@ -96,6 +131,8 @@ static void gen_rp_dev_class_init(ObjectClass
>>> *klass, void *data)
>>>       rpc->interrupts_init = gen_rp_interrupts_init;
>>>       rpc->interrupts_uninit = gen_rp_interrupts_uninit;
>>>       rpc->aer_offset = GEN_PCIE_ROOT_PORT_AER_OFFSET;
>>> +    grpc->parent_realize = dc->realize;
>>> +    dc->realize = gen_rp_realize;
>>>   }
>>>     static const TypeInfo gen_rp_dev_info = {
>>> @@ -103,6 +140,7 @@ static const TypeInfo gen_rp_dev_info = {
>>>       .parent        = TYPE_PCIE_ROOT_PORT,
>>>       .instance_size = sizeof(GenPCIERootPort),
>>>       .class_init    = gen_rp_dev_class_init,
>>> +    .class_size    = sizeof(GenPCIERootPortClass),
>>>   };
>>>      static void gen_rp_register_types(void)
>>> diff --git a/include/hw/compat.h b/include/hw/compat.h
>>> index 3e101f8f67..843bf4a3a5 100644
>>> --- a/include/hw/compat.h
>>> +++ b/include/hw/compat.h
>>> @@ -2,7 +2,11 @@
>>>   #define HW_COMPAT_H
>>>     #define HW_COMPAT_2_10 \
>>> -    /* empty */
>>> +    {\
>>> +        .driver   = "pcie-root-port",\
>>> +        .property = "enable-io-fwd",\
>>> +        .value    = "true",\
>>> +    },
>>>     #define HW_COMPAT_2_9 \
>>>       {\
>>>
>>
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]