qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-ppc] How to debug crash in TCG code?


From: BALATON Zoltan
Subject: Re: [Qemu-ppc] How to debug crash in TCG code?
Date: Sat, 19 Aug 2017 23:29:46 +0200 (CEST)
User-agent: Alpine 2.21 (BSF 202 2017-01-01)

On Thu, 27 Jul 2017, BALATON Zoltan wrote:
Hello,

I'm getting a segfault in generated code that I don't know how to debug further. The back trace shows:

This can be reproduced trying to boot this iso:

http://www.xenosoft.de/Sam460ex_Debian_Jessie-3.iso

with the Sam460ex emulation posted here:

http://lists.nongnu.org/archive/html/qemu-ppc/2017-08/msg00112.html

This leads to a QEMU crash while reading from the SATA controller (the addr below belongs to that controller) but not always, it depends on the amount of code run in the guest and the problem may go away or happen elsewhere when I add debug logs to firmware code. Does anyone have an idea how to debug this?

Could it be somehow related to my other problem described here:

http://lists.nongnu.org/archive/html/qemu-ppc/2017-08/msg00220.html

Any help is greatly appreciated.

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe87f7700 (LWP 24372)]
0x00005555557ee0a1 in io_readx (env=0x7fffe88002a0, iotlbentry=0x7fffe8811d60, addr=3623882752, retaddr=140737096497196, size=2)
   at accel/tcg/cputlb.c:766
766         if (mr->global_locking) {
(gdb) bt
#0 0x00005555557ee0a1 in io_readx (env=0x7fffe88002a0, iotlbentry=0x7fffe8811d60, addr=3623882752, retaddr=140737096497196, size=2)
   at accel/tcg/cputlb.c:766
#1 0x00005555557eede9 in io_readw (env=0x7fffe88002a0, mmu_idx=1, index=4, addr=3623882752, retaddr=140737096497196)
   at softmmu_template.h:104
#2 0x00005555557ef1f0 in helper_be_lduw_mmu (env=0x7fffe88002a0, addr=3623882752, oi=145, retaddr=140737096497196)
   at softmmu_template.h:208
#3  0x00007fffe8a4b8d3 in code_gen_buffer ()
#4 0x00005555557f69b8 in cpu_tb_exec (cpu=0x7fffe87f8010, itb=0x7fffe8a4b660 <code_gen_buffer+1242678>)
   at accel/tcg/cpu-exec.c:166
#5 0x00005555557f769f in cpu_loop_exec_tb (cpu=0x7fffe87f8010, tb=0x7fffe8a4b660 <code_gen_buffer+1242678>, last_tb=0x7fffe87f6af8, tb_exit=0x7fffe87f6af4) at accel/tcg/cpu-exec.c:578 #6 0x00005555557f7992 in cpu_exec (cpu=0x7fffe87f8010) at accel/tcg/cpu-exec.c:676
#7  0x00005555557c2955 in tcg_cpu_exec (cpu=0x7fffe87f8010) at cpus.c:1270
#8 0x00005555557c2b8c in qemu_tcg_rr_cpu_thread_fn (arg=0x7fffe87f8010) at cpus.c:1365
#9  0x00007ffff5d515bd in start_thread () from /lib64/libpthread.so.0
#10 0x00007ffff42d062d in clone () from /lib64/libc.so.6
(gdb) p mr
$1 = (MemoryRegion *) 0x0

This is happening while reading from an emulated ATAPI DVD and happens after several successful reads from the same device with similar calls succeeding without a problem until hitting the above error. The point where this happens seems to depend on the ammount of guest code executed. The more code is there, the sooner this happens. (This is running in TCG ppc-softmmu on x86_64 host in case that's relevant but I can't make an easy test case to reproduce it.)

First I thought it may be related to MTTCG or removing the iothread lock but I could also get the same with 791158d9, where the back trace is:

#0 0x00005555557e1de5 in memory_region_access_valid (mr=0x0, addr=0, size=2, is_write=false) at memory.c:1204 #1 0x00005555557e200a in memory_region_dispatch_read (mr=0x0, addr=0, pval=0x7fffe4854488, size=2, attrs=...)
   at memory.c:1268
#2 0x00005555557e7f9c in io_readx (env=0x7ffff7e232a0, iotlbentry=0x7ffff7e34d58, addr=3623882752,
    retaddr=140737066697996, size=2) at cputlb.c:506
#3 0x00005555557e8a9e in io_readw (env=0x7ffff7e232a0, mmu_idx=1, index=4, addr=3623882752, retaddr=140737066697996)
   at softmmu_template.h:104
#4 0x00005555557e8eb0 in helper_be_lduw_mmu (env=0x7ffff7e232a0, addr=3623882752, oi=145, retaddr=140737066697996)
   at softmmu_template.h:208
#5  0x00007fffe6de05b3 in code_gen_buffer ()
#6 0x0000555555783fca in cpu_tb_exec (cpu=0x7ffff7e1b010, itb=0x7fffe49a2080) at cpu-exec.c:164 #7 0x0000555555784b97 in cpu_loop_exec_tb (cpu=0x7ffff7e1b010, tb=0x7fffe49a2080, last_tb=0x7fffe4854af8,
    tb_exit=0x7fffe4854af4, sc=0x7fffe4854b10) at cpu-exec.c:550
#8  0x0000555555784ea0 in cpu_exec (cpu=0x7ffff7e1b010) at cpu-exec.c:655
#9  0x00005555557c8da3 in tcg_cpu_exec (cpu=0x7ffff7e1b010) at cpus.c:1253
#10 0x00005555557c900d in qemu_tcg_cpu_thread_fn (arg=0x7ffff7e1b010) at cpus.c:1345
#11 0x00007ffff45b65bd in start_thread () from /lib64/libpthread.so.0
#12 0x00007ffff42f262d in clone () from /lib64/libc.so.6

So it seems to be caused not by thread locking issues by recent changes but maybe by somehow referencing an invalid iotlb entry in a TB. My theory (without knowing anything about how this part of QEMU works) is that as code is executed instruction and data exceptions are triggered which make changes in TLB entries but this does not correctly invalidate a TB that already references this entry and this causes the crash when this happens (but it works until the TLB is not changed which explains why less code works and more code which makes these exceptions more frequent triggers it sooner). But I have no idea if this theory is correct or how to verify it and where to look for the problem and fix.

Does anyone have any idea that could help or point me to the right direction please?

Thank you,
BALATON Zoltan




reply via email to

[Prev in Thread] Current Thread [Next in Thread]