[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC v3 0/1] memory: Delete assertion in memory_region_unregister_io

From: Eugenio Perez Martin
Subject: Re: [RFC v3 0/1] memory: Delete assertion in memory_region_unregister_iommu_notifier
Date: Wed, 12 Aug 2020 16:33:24 +0200

On Tue, Aug 11, 2020 at 9:28 PM Peter Xu <peterx@redhat.com> wrote:
> Hi, Eugenio,
> On Tue, Aug 11, 2020 at 08:10:44PM +0200, Eugenio Perez Martin wrote:
> > Using this patch as a reference, I'm having problems to understand:
> >
> > - I'm not sure that the flag name expresses clearly the notifier capability.
> The old code is kind of messed up for dev-iotlb invalidations, by always
> sending UNMAP notifications for both iotlb and dev-iotlb invalidations.
> Now if we introduce the new DEV_IOTLB type, we can separate the two:
>   - We notify IOMMU_NOTIFIER_UNMAP for iotlb invalidations
>   - We notify IOMMU_NOTIFIER_DEV_IOTLB for dev-iotlb invalidations
> Vhost should always be with ats=on when vIOMMU enabled, so it will enable
> device iotlb.  Then it does not need iotlb (UNMAP) invalidation any more
> because they'll normally be duplicated (one is to invalidate vIOMMU cache, one
> is to invalidate device cache).
> Also, we can drop UNMAP type for vhost if we introduce DEV_IOTLB.  It works
> just like on the real hardwares - the device won't be able to receive iotlb
> invalidation messages, but only device iotlb invalidation messages.  Here
> vhost (or, virtio-pci) is the device.
> > - What would be the advantages of using another field (NotifierType?)
> > in the notifier to express that it accepts arbitrary ranges for
> > unmapping? (If I understood correctly Jason's proposal)
> (Please refer to above too..)
> > - Is it possible (or advisable) to skip all the page splitting in
> > vtd_page_walk if the memory range notifier supports these arbitrary
> > ranges? What would be the disadvantages? (Maybe in a future patch). It
> > seems it is advisable to me, but I would like to double confirm.
> vtd_page_walk is not used for dev-iotlb, we don't need to change that.  We 
> also
> want to explicitly keep the page granularity of vtd_page_walk for the other
> IOMMU notifiers, e.g. vfio.

I'm not sure if I'm understanding it.

I have here a backtrace in a regular call (not [0,~0ULL]):
#0  0x000055555597ca63 in memory_region_notify_one_iommu
(notifier=0x7fffe4976f08, entry=0x7fffddff9d20)
    at /home/qemu/softmmu/memory.c:1895
#1  0x000055555597cc87 in memory_region_notify_iommu
(iommu_mr=0x55555728f2e0, iommu_idx=0, entry=...) at
#2  0x000055555594b95a in vtd_sync_shadow_page_hook
(entry=0x7fffddff9e70, private=0x55555728f2e0) at
#3  0x000055555594af7b in vtd_page_walk_one (entry=0x7fffddff9e70,
info=0x7fffddffa140) at /home/qemu/hw/i386/intel_iommu.c:1173
#4  0x000055555594b2b3 in vtd_page_walk_level
    (addr=10531758080, start=4292870144, end=4294967296, level=1,
read=true, write=true, info=0x7fffddffa140)
    at /home/qemu/hw/i386/intel_iommu.c:1254
#5  0x000055555594b225 in vtd_page_walk_level
    (addr=10530922496, start=3221225472, end=4294967296, level=2,
read=true, write=true, info=0x7fffddffa140)
    at /home/qemu/hw/i386/intel_iommu.c:1236
#6  0x000055555594b225 in vtd_page_walk_level
    (addr=10529021952, start=0, end=549755813888, level=3, read=true,
write=true, info=0x7fffddffa140)
    at /home/qemu/hw/i386/intel_iommu.c:1236
#7  0x000055555594b3f8 in vtd_page_walk (s=0x555557565210,
ce=0x7fffddffa1a0, start=0, end=549755813888, info=0x7fffddffa140)
    at /home/qemu/hw/i386/intel_iommu.c:1293
#8  0x000055555594ba77 in vtd_sync_shadow_page_table_range
(vtd_as=0x55555728f270, ce=0x7fffddffa1a0, addr=0,
    at /home/qemu/hw/i386/intel_iommu.c:1467
#9  0x000055555594bb50 in vtd_sync_shadow_page_table
(vtd_as=0x55555728f270) at /home/qemu/hw/i386/intel_iommu.c:1498
#10 0x000055555594cc5f in vtd_iotlb_domain_invalidate
(s=0x555557565210, domain_id=3) at
#11 0x000055555594dbae in vtd_process_iotlb_desc (s=0x555557565210,
inv_desc=0x7fffddffa2b0) at /home/qemu/hw/i386/intel_iommu.c:2371
#12 0x000055555594dfd3 in vtd_process_inv_desc (s=0x555557565210) at
#13 0x000055555594e1e9 in vtd_fetch_inv_desc (s=0x555557565210) at
#14 0x000055555594e330 in vtd_handle_iqt_write (s=0x555557565210) at
#15 0x000055555594ed90 in vtd_mem_write (opaque=0x555557565210,
addr=136, val=1888, size=4) at /home/qemu/hw/i386/intel_iommu.c:2842
#16 0x00005555559787b9 in memory_region_write_accessor
    (mr=0x555557565540, addr=136, value=0x7fffddffa478, size=4,
shift=0, mask=4294967295, attrs=...) at
#17 0x00005555559789d7 in access_with_adjusted_size
    (addr=136, value=0x7fffddffa478, size=4, access_size_min=4,
access_size_max=8, access_fn=
    0x5555559786da <memory_region_write_accessor>, mr=0x555557565540,
attrs=...) at /home/qemu/softmmu/memory.c:544
#18 0x000055555597b8a5 in memory_region_dispatch_write
(mr=0x555557565540, addr=136, data=1888, op=MO_32, attrs=...)
    at /home/qemu/softmmu/memory.c:1465
#19 0x000055555582b1bf in flatview_write_continue
    (fv=0x7fffc447c470, addr=4275634312, attrs=...,
ptr=0x7ffff7dfd028, len=4, addr1=136, l=4, mr=0x555557565540) at
#20 0x000055555582b304 in flatview_write (fv=0x7fffc447c470,
addr=4275634312, attrs=..., buf=0x7ffff7dfd028, len=4)
    at /home/qemu/exec.c:3216
#21 0x000055555582b659 in address_space_write
    (as=0x5555567a9380 <address_space_memory>, addr=4275634312,
attrs=..., buf=0x7ffff7dfd028, len=4) at /home/qemu/exec.c:3307
#22 0x000055555582b6c6 in address_space_rw
    (as=0x5555567a9380 <address_space_memory>, addr=4275634312,
attrs=..., buf=0x7ffff7dfd028, len=4, is_write=true)
    at /home/qemu/exec.c:3317
#23 0x000055555588e3b8 in kvm_cpu_exec (cpu=0x555556bfe9f0) at
#24 0x0000555555972bcf in qemu_kvm_cpu_thread_fn (arg=0x555556bfe9f0)
at /home/qemu/softmmu/cpus.c:1188
#25 0x0000555555e08fbd in qemu_thread_start (args=0x555556c24c60) at
#26 0x00007ffff55a714a in start_thread () at /lib64/libpthread.so.0
#27 0x00007ffff52d8f23 in clone () at /lib64/libc.so.6

with entry = {target_as = 0x5555567a9380, iova = 0xfff0b000,
translated_addr = 0x0, addr_mask = 0xfff, perm = 0x0}

Here we got 3 levels of vtd_page_walk (frames #4-#6). The #6 parameters are:

(addr=10529021952, start=0, end=549755813888, level=3, read=true, write=true,

If I understand correctly, the while (iova < end) {} loop in
vtd_page_walk will break the big range in small pages (4K because of
level=1, and (end - iova) / subpage_size = 245 pages or iterations).
That could be a lot of write(2) in vhost_kernel_send_device_iotlb_msg
in the worst case, or a lot of useless returns in
memory_region_notify_one_iommu because of (notifier->start > entry_end
|| notifier->end < entry->iova) in the best.

Am I right with this? I understand that others notifiers (you mention
vfio) need the granularity, but would a check in some vtd_* function
for the help with the performance? (not suggesting to introduce it in
this patch series).

Thank you very much.

> Though we'll need to modify vtd_process_device_iotlb_desc() to only send
> notifications to the notifiers that registered with DEV_IOTLB flag.
> Hope it helps..
> Thanks,
> --
> Peter Xu

reply via email to

[Prev in Thread] Current Thread [Next in Thread]