qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v7 1/4] spapr_iommu: Make in-kernel TCE table op


From: Alexey Kardashevskiy
Subject: Re: [Qemu-devel] [PATCH v7 1/4] spapr_iommu: Make in-kernel TCE table optional
Date: Thu, 05 Jun 2014 16:43:41 +1000
User-agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0

On 06/05/2014 03:49 PM, Alexey Kardashevskiy wrote:
> POWER KVM supports an KVM_CAP_SPAPR_TCE capability which allows allocating
> TCE tables in the host kernel memory and handle H_PUT_TCE requests
> targeted to specific LIOBN (logical bus number) right in the host without
> switching to QEMU. At the moment this is used for emulated devices only
> and the handler only puts TCE to the table. If the in-kernel H_PUT_TCE
> handler finds a LIOBN and corresponding table, it will put a TCE to
> the table and complete hypercall execution. The user space will not be
> notified.
> 
> Upcoming VFIO support is going to use the same sPAPRTCETable device class
> so KVM_CAP_SPAPR_TCE is going to be used as well. That means that TCE
> tables for VFIO are going to be allocated in the host as well.
> However VFIO operates with real IOMMU tables and simple copying of
> a TCE to the real hardware TCE table will not work as guest physical
> to host physical address translation is requited.
> 
> So until the host kernel gets VFIO support for H_PUT_TCE, we better not
> to register VFIO's TCE in the host.
> 
> This adds a bool @kvm_accel flag to the sPAPRTCETable device telling
> that sPAPRTCETable should not try allocating TCE table in the host kernel.
> Instead, the table will be created in QEMU.
> 
> This adds an kvm_accel parameter to spapr_tce_new_table() to let users
> choose whether to use acceleration or not. At the moment it is enabled
> for VIO and emulated PCI. Upcoming VFIO support will set it to false.
> 
> Signed-off-by: Alexey Kardashevskiy <address@hidden>
> ---
> 
> This is a workaround but it lets me have one IOMMU device for VIO, emulated
> PCI and VFIO which is a good thing.
> 
> The other way around would be a new KVM_CAP_SPAPR_TCE_VFIO capability but
> this needs kernel update.


Never mind, I'll make it a capability. I'll post capability reservation
patch separately.


> ---
>  hw/ppc/spapr_iommu.c   | 6 ++++--
>  hw/ppc/spapr_pci.c     | 2 +-
>  hw/ppc/spapr_vio.c     | 2 +-
>  include/hw/ppc/spapr.h | 4 +++-
>  4 files changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> index 3b6e373..bfd3701 100644
> --- a/hw/ppc/spapr_iommu.c
> +++ b/hw/ppc/spapr_iommu.c
> @@ -115,7 +115,7 @@ static int spapr_tce_table_realize(DeviceState *dev)
>  {
>      sPAPRTCETable *tcet = SPAPR_TCE_TABLE(dev);
>  
> -    if (kvm_enabled()) {
> +    if (tcet->kvm_accel && kvm_enabled()) {
>          tcet->table = kvmppc_create_spapr_tce(tcet->liobn,
>                                                tcet->nb_table <<
>                                                tcet->page_shift,
> @@ -143,7 +143,8 @@ static int spapr_tce_table_realize(DeviceState *dev)
>  sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn,
>                                     uint64_t bus_offset,
>                                     uint32_t page_shift,
> -                                   uint32_t nb_table)
> +                                   uint32_t nb_table,
> +                                   bool kvm_accel)
>  {
>      sPAPRTCETable *tcet;
>  
> @@ -162,6 +163,7 @@ sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, 
> uint32_t liobn,
>      tcet->bus_offset = bus_offset;
>      tcet->page_shift = page_shift;
>      tcet->nb_table = nb_table;
> +    tcet->kvm_accel = kvm_accel;
>  
>      object_property_add_child(OBJECT(owner), "tce-table", OBJECT(tcet), 
> NULL);
>  
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index ddfd8bb..6021f35 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -658,7 +658,7 @@ static void spapr_phb_finish_realize(sPAPRPHBState *sphb, 
> Error **errp)
>      tcet = spapr_tce_new_table(DEVICE(sphb), sphb->dma_liobn,
>                                 0,
>                                 SPAPR_TCE_PAGE_SHIFT,
> -                               0x40000000 >> SPAPR_TCE_PAGE_SHIFT);
> +                               0x40000000 >> SPAPR_TCE_PAGE_SHIFT, true);
>      if (!tcet) {
>          error_setg(errp, "Unable to create TCE table for %s",
>                     sphb->dtbusname);
> diff --git a/hw/ppc/spapr_vio.c b/hw/ppc/spapr_vio.c
> index 48b0125..16385e4 100644
> --- a/hw/ppc/spapr_vio.c
> +++ b/hw/ppc/spapr_vio.c
> @@ -460,7 +460,7 @@ static int spapr_vio_busdev_init(DeviceState *qdev)
>                                          0,
>                                          SPAPR_TCE_PAGE_SHIFT,
>                                          pc->rtce_window_size >>
> -                                        SPAPR_TCE_PAGE_SHIFT);
> +                                        SPAPR_TCE_PAGE_SHIFT, true);
>          address_space_init(&dev->as, spapr_tce_get_iommu(dev->tcet), 
> qdev->id);
>      }
>  
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 4ffb903..7db34ff 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -402,6 +402,7 @@ struct sPAPRTCETable {
>      uint32_t page_shift;
>      uint64_t *table;
>      bool bypass;
> +    bool kvm_accel;
>      int fd;
>      MemoryRegion iommu;
>      QLIST_ENTRY(sPAPRTCETable) list;
> @@ -413,7 +414,8 @@ int spapr_h_cas_compose_response(target_ulong addr, 
> target_ulong size);
>  sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn,
>                                     uint64_t bus_offset,
>                                     uint32_t page_shift,
> -                                   uint32_t nb_table);
> +                                   uint32_t nb_table,
> +                                   bool kvm_accel);
>  MemoryRegion *spapr_tce_get_iommu(sPAPRTCETable *tcet);
>  void spapr_tce_set_bypass(sPAPRTCETable *tcet, bool bypass);
>  int spapr_dma_dt(void *fdt, int node_off, const char *propname,
> 


-- 
Alexey



reply via email to

[Prev in Thread] Current Thread [Next in Thread]