qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Question on memory commit during MR finalize()


From: Thanos Makatos
Subject: RE: Question on memory commit during MR finalize()
Date: Fri, 16 Jul 2021 11:42:02 +0000

> -----Original Message-----
> From: Peter Xu <peterx@redhat.com>
> Sent: 15 July 2021 19:35
> To: Thanos Makatos <thanos.makatos@nutanix.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>; Markus Armbruster
> <armbru@redhat.com>; QEMU Devel Mailing List <qemu-
> devel@nongnu.org>; John Levon <john.levon@nutanix.com>; John G
> Johnson <john.g.johnson@oracle.com>
> Subject: Re: Question on memory commit during MR finalize()
> 
> On Thu, Jul 15, 2021 at 02:27:48PM +0000, Thanos Makatos wrote:
> > Hi Peter,
> 
> Hi, Thanos,
> 
> > We're hitting this issue using a QEMU branch where JJ is using vfio-user as
> the transport for multiprocess-qemu
> (https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__github.com_oracle_qemu_issues_9&d=DwIBaQ&c=s883GpUCOChKOHi
> ocYtGcg&r=XTpYsh5Ps2zJvtw6ogtti46atk736SI4vgsJiUKIyDE&m=9nFuGF9kg5l
> ZsKPi03BNzo9pckG8DlodVG0LuEofnKw&s=dcp70CIgJljcWFwSRZm5zZRJj80jX
> XERLwpbH6ZcgzQ&e= ). We can reproduce it fairly reliably by migrating a
> virtual SPDK NVMe controller (the NVMf/vfio-user target with experimental
> migration support, https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__review.spdk.io_gerrit_c_spdk_spdk_-
> 2B_7617_14&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=XTpYsh5Ps2zJvtw
> 6ogtti46atk736SI4vgsJiUKIyDE&m=9nFuGF9kg5lZsKPi03BNzo9pckG8DlodVG0
> LuEofnKw&s=iXolOQM5sYj4IB-cf__Ta8jgKXZqisYE-uuwq6qnbLo&e= ). I can
> provide detailed repro instructions but first I want to make sure we're not
> missing any patches.
> 
> I don't think you missed any bug fix patches, as the issue I mentioned can
> only be trigger with my own branch at that time, and that's fixed when my
> patchset got merged.
> 
> However if you encountered the same issue, it's possible that there's an
> incorrect use of qemu memory/cpu API too somewhere there so similar
> issue is triggered.  For example, in my case it was run_on_cpu() called
> incorrectly within memory layout changing so BQL is released without being
> noticed.
> 
> I've got a series that tries to expose these hard to debug issues:
> 
> https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__lore.kernel.org_qemu-2Ddevel_20200421162108.594796-2D1-2Dpeterx-
> 40redhat.com_&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=XTpYsh5Ps2zJ
> vtw6ogtti46atk736SI4vgsJiUKIyDE&m=9nFuGF9kg5lZsKPi03BNzo9pckG8Dlod
> VG0LuEofnKw&s=kQRJEb4CQmxEirS-III15QJz_phzhCYLIgjOF-SB9Pk&e=
> 
> Obviously the series didn't track enough interest so it didn't get merged.
> However maybe that's also something useful to what you're debugging, so
> you can apply those patches onto your branch and see the stack when it
> reproduces again. Logically with these sanity patches it could fail earlier 
> than
> what you've hit right now (which I believe should be within the RCU thread;
> btw it would be interesting to share your stack too when it's hit) and it 
> could
> provide more useful information.
> 
> I saw that the old series won't apply onto master any more, so I rebased it
> and pushed it here (with one patch dropped since someone wrote a similar
> patch and got merged, so there're only 7 patches in the new tree):
> 
> https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__github.com_xzpeter_qemu_tree_memory-
> 2Dsanity&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=XTpYsh5Ps2zJvtw6og
> tti46atk736SI4vgsJiUKIyDE&m=9nFuGF9kg5lZsKPi03BNzo9pckG8DlodVG0LuE
> ofnKw&s=G-8FV-H-VcZTgCVRfTEVKo1GALIk2PqBvTdAcAXFoZ0&e=
> 
> No guarantee it'll help, but IMHO worth trying.

The memory-sanity branch fails to build:

./configure --prefix=/opt/qemu-xzpeter --target-list=x86_64-linux-user  
--enable-debug
make -j 8
...
[697/973] Linking target qemu-x86_64
FAILED: qemu-x86_64
c++  -o qemu-x86_64 libcommon.fa.p/cpus-common.c.o 
libcommon.fa.p/page-vary-common.c.o libcommon.fa.p/disas_i386.c.o 
libcommon.fa.p/disas_capstone.c.o libcommon.fa.p/hw_core_cpu-common.c.o 
libcommon.fa.p/ebpf_ebpf_rss-stub.c.o libcommon.fa.p/accel_accel-user.c.o 
libqemu-x86_64-linux-user.fa.p/target_i386_tcg_user_excp_helper.c.o 
libqemu-x86_64-linux-user.fa.p/target_i386_tcg_user_seg_helper.c.o 
libqemu-x86_64-linux-user.fa.p/linux-user_x86_64_signal.c.o 
libqemu-x86_64-linux-user.fa.p/linux-user_x86_64_cpu_loop.c.o 
libqemu-x86_64-linux-user.fa.p/target_i386_cpu.c.o 
libqemu-x86_64-linux-user.fa.p/target_i386_gdbstub.c.o 
libqemu-x86_64-linux-user.fa.p/target_i386_helper.c.o 
libqemu-x86_64-linux-user.fa.p/target_i386_xsave_helper.c.o 
libqemu-x86_64-linux-user.fa.p/target_i386_cpu-dump.c.o 
libqemu-x86_64-linux-user.fa.p/target_i386_sev-stub.c.o 
libqemu-x86_64-linux-user.fa.p/target_i386_kvm_kvm-stub.c.o 
libqemu-x86_64-linux-user.fa.p/target_i386_tcg_bpt_helper.c.o 
libqemu-x86_64-linux-user.fa.p/target_i386_tcg_cc_helper.c.o 
libqemu-x86_64-linux-user.fa.p/target_i386_tcg_excp_helper.c.o 
libqemu-x86_64-linux-user.fa.p/target_i386_tcg_fpu_helper.c.o 
libqemu-x86_64-linux-user.fa.p/target_i386_tcg_int_helper.c.o 
libqemu-x86_64-linux-user.fa.p/target_i386_tcg_mem_helper.c.o 
libqemu-x86_64-linux-user.fa.p/target_i386_tcg_misc_helper.c.o 
libqemu-x86_64-linux-user.fa.p/target_i386_tcg_mpx_helper.c.o 
libqemu-x86_64-linux-user.fa.p/target_i386_tcg_seg_helper.c.o 
libqemu-x86_64-linux-user.fa.p/target_i386_tcg_tcg-cpu.c.o 
libqemu-x86_64-linux-user.fa.p/target_i386_tcg_translate.c.o 
libqemu-x86_64-linux-user.fa.p/trace_control-target.c.o 
libqemu-x86_64-linux-user.fa.p/cpu.c.o libqemu-x86_64-linux-user.fa.p/disas.c.o 
libqemu-x86_64-linux-user.fa.p/gdbstub.c.o 
libqemu-x86_64-linux-user.fa.p/page-vary.c.o 
libqemu-x86_64-linux-user.fa.p/tcg_optimize.c.o 
libqemu-x86_64-linux-user.fa.p/tcg_region.c.o 
libqemu-x86_64-linux-user.fa.p/tcg_tcg.c.o 
libqemu-x86_64-linux-user.fa.p/tcg_tcg-common.c.o 
libqemu-x86_64-linux-user.fa.p/tcg_tcg-op.c.o 
libqemu-x86_64-linux-user.fa.p/tcg_tcg-op-gvec.c.o 
libqemu-x86_64-linux-user.fa.p/tcg_tcg-op-vec.c.o 
libqemu-x86_64-linux-user.fa.p/fpu_softfloat.c.o 
libqemu-x86_64-linux-user.fa.p/accel_accel-common.c.o 
libqemu-x86_64-linux-user.fa.p/accel_tcg_tcg-all.c.o 
libqemu-x86_64-linux-user.fa.p/accel_tcg_cpu-exec-common.c.o 
libqemu-x86_64-linux-user.fa.p/accel_tcg_cpu-exec.c.o 
libqemu-x86_64-linux-user.fa.p/accel_tcg_tcg-runtime-gvec.c.o 
libqemu-x86_64-linux-user.fa.p/accel_tcg_tcg-runtime.c.o 
libqemu-x86_64-linux-user.fa.p/accel_tcg_translate-all.c.o 
libqemu-x86_64-linux-user.fa.p/accel_tcg_translator.c.o 
libqemu-x86_64-linux-user.fa.p/accel_tcg_user-exec.c.o 
libqemu-x86_64-linux-user.fa.p/accel_tcg_user-exec-stub.c.o 
libqemu-x86_64-linux-user.fa.p/accel_tcg_plugin-gen.c.o 
libqemu-x86_64-linux-user.fa.p/accel_stubs_hax-stub.c.o 
libqemu-x86_64-linux-user.fa.p/accel_stubs_xen-stub.c.o 
libqemu-x86_64-linux-user.fa.p/accel_stubs_kvm-stub.c.o 
libqemu-x86_64-linux-user.fa.p/plugins_loader.c.o 
libqemu-x86_64-linux-user.fa.p/plugins_core.c.o 
libqemu-x86_64-linux-user.fa.p/plugins_api.c.o 
libqemu-x86_64-linux-user.fa.p/linux-user_elfload.c.o 
libqemu-x86_64-linux-user.fa.p/linux-user_exit.c.o 
libqemu-x86_64-linux-user.fa.p/linux-user_fd-trans.c.o 
libqemu-x86_64-linux-user.fa.p/linux-user_linuxload.c.o 
libqemu-x86_64-linux-user.fa.p/linux-user_main.c.o 
libqemu-x86_64-linux-user.fa.p/linux-user_mmap.c.o 
libqemu-x86_64-linux-user.fa.p/linux-user_safe-syscall.S.o 
libqemu-x86_64-linux-user.fa.p/linux-user_signal.c.o 
libqemu-x86_64-linux-user.fa.p/linux-user_strace.c.o 
libqemu-x86_64-linux-user.fa.p/linux-user_syscall.c.o 
libqemu-x86_64-linux-user.fa.p/linux-user_uaccess.c.o 
libqemu-x86_64-linux-user.fa.p/linux-user_uname.c.o 
libqemu-x86_64-linux-user.fa.p/thunk.c.o 
libqemu-x86_64-linux-user.fa.p/meson-generated_.._x86_64-linux-user-gdbstub-xml.c.o
 libqemu-x86_64-linux-user.fa.p/meson-generated_.._trace_generated-helpers.c.o 
-Wl,--as-needed -Wl,--no-undefined -pie -Wl,--whole-archive libhwcore.fa 
libqom.fa -Wl,--no-whole-archive -Wl,--warn-common -Wl,-z,relro -Wl,-z,now -m64 
-fstack-protector-strong -Wl,--start-group libcapstone.a libqemuutil.a 
libhwcore.fa libqom.fa -ldl 
-Wl,--dynamic-list=/root/src/qemu/build/qemu-plugins-ld.symbols -lrt -lutil -lm 
-pthread -Wl,--export-dynamic -lgmodule-2.0 -lglib-2.0 -lstdc++ -Wl,--end-group
/usr/bin/ld: libcommon.fa.p/cpus-common.c.o: in function `do_run_on_cpu':
/root/src/qemu/build/../cpus-common.c:153: undefined reference to 
`qemu_cond_wait_iothread'
collect2: error: ld returned 1 exit status
[698/973] Compiling C object 
tests/fp/libsoftfloat.a.p/berkeley-softfloat-3_source_f32_to_ui64_r_minMag.c.o
[699/973] Compiling C object 
tests/fp/libsoftfloat.a.p/berkeley-softfloat-3_source_f32_to_i32_r_minMag.c.o
[700/973] Compiling C object 
tests/fp/libsoftfloat.a.p/berkeley-softfloat-3_source_f32_to_f16.c.o
[701/973] Compiling C object 
tests/fp/libsoftfloat.a.p/berkeley-softfloat-3_source_f32_to_f64.c.o
[702/973] Compiling C object 
tests/fp/libsoftfloat.a.p/berkeley-softfloat-3_source_f32_to_i64_r_minMag.c.o
[703/973] Compiling C object 
tests/fp/libsoftfloat.a.p/berkeley-softfloat-3_source_f32_to_extF80M.c.o
[704/973] Compiling C object 
tests/fp/libsoftfloat.a.p/berkeley-softfloat-3_source_f32_to_extF80.c.o
ninja: build stopped: subcommand failed.
make[1]: *** [Makefile:154: run-ninja] Error 1
make[1]: Leaving directory '/root/src/qemu/build'
make: *** [GNUmakefile:11: all] Error 2

Regarding the stack trace, I can very easily reproduce it on our branch, I know 
exactly where to set the breakpoint:

(gdb) r
Starting prThread 0x7fffeffff7 In: __pthread_cond_waitu host -enable-kvm -smp 4 
-nographic -m 2G -object 
memory-backend-file,id=mem0,size=2G,mem-path=/dev/hugepages,share=on,prealloc=yes,
 -numa node,memdev=mem0 -L88   PC: 0x7ffff772700cuThread 8 "qemu-system-x86" 
received signal SIGUSR1, User defined signal 1.
                        f58c1        GI_raise                                   
                                                                                
                                                     50               58f7bb
#0  0x00007ffff758f7bb in __GI_raise (sig=sig@entry=6) at 
../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ffff757a535 in __GI_abort () at abort.c:79
#2  0x0000555555c9301e in kvm_set_phys_mem (kml=0x5555568ee830, 
section=0x7ffff58c05e0, add=true) at ../accel/kvm/kvm-all.c:1194
#3  0x0000555555c930cd in kvm_region_add (listener=0x5555568ee830, 
section=0x7ffff58c05e0) at ../accel/kvm/kvm-all.c:1211
#4  0x0000555555bd6c9e in address_space_update_topology_pass (as=0x555556648420 
<address_space_memory>, old_view=0x555556f21730, new_view=0x7ffff0001cb0, 
adding=true) at ../softmmu/memory.c:971
#5  0x0000555555bd6f98 in address_space_set_flatview (as=0x555556648420 
<address_space_memory>) at ../softmmu/memory.c:1047
#6  0x0000555555bd713f in memory_region_transaction_commit () at 
../softmmu/memory.c:1099
#7  0x0000555555bd89a5 in memory_region_finalize (obj=0x555556e21800) at 
../softmmu/memory.c:1751
#8  0x0000555555cca132 in object_deinit (obj=0x555556e21800, 
type=0x5555566a8f80) at ../qom/object.c:673
#9  0x0000555555cca1a4 in object_finalize (data=0x555556e21800) at 
../qom/object.c:687
#10 0x0000555555ccb196 in object_unref (objptr=0x555556e21800) at 
../qom/object.c:1186
#11 0x0000555555bb11f0 in phys_section_destroy (mr=0x555556e21800) at 
../softmmu/physmem.c:1171
#12 0x0000555555bb124a in phys_sections_free (map=0x5555572cf9a0) at 
../softmmu/physmem.c:1180
#13 0x0000555555bb4632 in address_space_dispatch_free (d=0x5555572cf990) at 
../softmmu/physmem.c:2562
#14 0x0000555555bd4485 in flatview_destroy (view=0x5555572cf950) at 
../softmmu/memory.c:291
#15 0x0000555555e367e8 in call_rcu_thread (opaque=0x0) at ../util/rcu.c:281
#16 0x0000555555e68e57 in qemu_thread_start (args=0x555556665e30) at 
../util/qemu-thread-posix.c:521
#17 0x00007ffff7720fa3 in start_thread (arg=<optimized out>) at 
pthread_create.c:486lot=10, start=0xfebd1000, size=0x1000: File exists
#18 0x00007ffff76514cf in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:95

reply via email to

[Prev in Thread] Current Thread [Next in Thread]