qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2 5/8] x86_iommu/amd: Add interrupt remap suppo


From: Singh, Brijesh
Subject: Re: [Qemu-devel] [PATCH v2 5/8] x86_iommu/amd: Add interrupt remap support when VAPIC is not enabled
Date: Tue, 18 Sep 2018 20:27:59 +0000


On 09/17/2018 10:53 PM, Peter Xu wrote:
[...]

>> IMHO we should not be using error_report_once() here. It's possible that
>> guest OS have DTE[IV]=1 but has not programmed the interrupt
>> remapping entries or have deactivated the remapping. I see that Linux
>> OS does it all the time and in those cases we will be printing out
>> error messages - which will give a wrong information.
> 
> I would be a bit curious on why the guest would like to switch the irq
> off if the device still needs to generate any... though this question
> could be out of topic so feel free to skip.

If you are interested then look at kernel irq domain [1] (Hierarchy IRQ
domain). There are 4 interfaces to the irq_domain:

- irq_domain_alloc_irqs() : when IR is enabled this will set DTE[IV]=1
- irq_domain_{activate,deactivate}: sets/clears the IRTE.fields.valid
   (aka IRTE.RemapEn bit).

You can find some additional info in this document on how a domain will
activate/deactivate resources.

[1] https://www.kernel.org/doc/Documentation/IRQ-domain.txt


> 
> But if that's normal then I would suggest you use a better naming of
> the trace point, "*_target_abort" seems to be a severe issue.
> 

I have been trying to adhere to IOMMU spec, if spec says that we should
cause an "target abort" then I try to use _target_abort tracepoint. In
case of vIOMMU we do not generate any IOMMU log event to communicate the
guest about the target aborts but on native system hardware will
generate a events which guest OS can parse to get additional info. Maybe
in future we can look at logging events from vIOMMU.


> Also, my suggestion is a general one - it still applies to the places
> that you think a well-behaved guest should not do.  It's optional so
> you can decide whether you want to use it.
> 

Agreed, I will certain look into it.


[...]

>> Honestly, I did struggled whether to use "exit" or error. Actually this
>> is a configuration error.
>>
>> AMD IOMMU does not have an explicit bit to indicate whether the IR
>> feature is supported. All versions of IOMMU supports IR. But in QEMU
>> we are using 'intermap' to enable/disable the feature. In case of AMD IOMMU,
>> we will not be able to link intremap to control the exposure of
>> this feature to the guest. So we need to do some checks during the
>> runtime. If we get asked to remap the interrupts (when intremap=off)
>> then IMHO its a configuration error and we should kill the guest.
>> Having said so, I am flexible and if needed then I can remove the
>> exit from next patch.
> 
> If all AMD IOMMU has IR, then I'm curious how the old AMD IOMMU
> emulation worked since that does not have an IR, do you know how the
> guest detects that?  (It must have done so otherwise IRQ won't work
> there, isn't it?)
> 
> And if that's true, instead of check intr_enabled here, I would
> suggest you check it in realize() - we'd better crash asap when we
> know this error, and we know it from the very beginning.
> 
>>


So far non of the guests were enabling the interrupt remap features
even when it was available. As I explained in previous patches (see
patch 6), Linux guest looks for a special IOAPIC device in IVHD before
enabling the interrupt remap. If you enable the amd_iommu_debug=1
in kernel cmdline then kernel bootup prints the message that its
attempting to enable the IR but since IOAPIC entry does not exist
in IVHD hence it disabled it.

Notes: for kicks you can apply patch 6 from this series in existing
code base and you will see that Linux OS will enable the IR (it does
care about intremap=<on|off> from amd-iommu.


[...]

>>
>> This time used the pre-defined macro's from the apic-msidef.h hence I
>> thought there was no need to include the field definitions. But, I will
>> go ahead and include the field definition in the next rev.
>>
>> * The delivery_mode field from the MSI data is used to get the upstream
>>    interrupt type.
>> * If the interrupt type is Fixed/Arbitrated, MSI data 10:0 is used
>>    as an offset within the IRTE table to get the actual IRTE entry
>>    (Fig 14).
>>
>> Please note that we don't emulate the HyperTransport hence instead we
>> should not treat MSI data 10:8 as index instead the MSI data 10:0 should
>> be used as "offset" within the IRTE table to get the interrupt remapping
>> table entries.
> 
> Frankly speaking until now I don't quite understand what's that
> HyperTransport thing... but ok I think I understand your point now:
> you mean that the "Interrupt type" mentioned in Table 19 is exactly
> the same bits (10:8) defined in general MSI data register.  I think
> that's the only knowledge that I'm missing from there. 


The MSI data register bit 10:8 definitions are similar to the Hyper
Transport MT encoding (Table 19), but they are not quite the same.

In our case we don't have HyperTransport so we need to use the
bit definition from the IOAPIC https://wiki.osdev.org/APIC

enum ioapic_irq_destination_types {
         dest_Fixed              = 0,
         dest_LowestPrio         = 1,
         dest_SMI                = 2,
         dest__reserved_1        = 3,
         dest_NMI                = 4,
         dest_INIT               = 5,
         dest__reserved_2        = 6,
         dest_ExtINT             = 7
};


  Say, Table 19
> told me that "what interrupt type should be controlled by what",
> however it never told me where could I find the interrupt types... or
> is it defined somewhere but I didn't notice?
> 
> (I do feel hard to read the AMD spec some times...)
> 

I hear you loud and clear :) sometimes its confusing ;)


> If my understanding is correct, maybe you could comment above the
> delivery_mode with things like "Please refer to Table 19; delivery
> mode is defined as the same as general MSI data format on bits 10:8".
> 

I will include some comments in next rev.

thanks for all your review feedbacks.

-Brijesh

reply via email to

[Prev in Thread] Current Thread [Next in Thread]