[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] Remaining CI failures
From: |
Alex Bennée |
Subject: |
[Qemu-devel] Remaining CI failures |
Date: |
Fri, 11 Jan 2019 19:10:07 +0000 |
User-agent: |
mu4e 1.1.0; emacs 26.1.91 |
Hi,
So trying to narrow down the remaining failures in the CI system. There
is one with a patch in flight (use g_usleep instead of sleep) but there
remains two failure modes, both erratic.
tests/qht-par:
I can trigger this on my dev machine with a gprof enabled build:
# QEMU configure log Fri Jan 11 14:10:45 GMT 2019
# Configured with: './configure' '--disable-tools' '--disable-docs'
'--enable-gprof' '--enable-gcov'
I only seem to be able to trigger it when running via the wrapper in the
make system:
retry.py -n 30 --invert make check-tests/test-qht-par
Eventually this crashes with:
ERROR:tests/test-qht-par.c:20:test_qht: assertion failed (rc == 0): (35584 ==
0)
Leaving a core dump for the child:
Core was generated by `tests/qht-bench -R -S0.1 -D10000 -N1 -n 2 -u 20 -d 1'
(gdb) info thread
Id Target Id Frame
* 1 Thread 0x7ffff6a7e700 (LWP 15473) 0x000055555557c306 in
call_rcu_thread (opaque=0x0) at util/rcu.c:255
2 Thread 0x7ffff7fbe780 (LWP 15472) 0x00005555555b8d50 in
gcov_read_words ()
(gdb) bt
#0 __mcount_internal (frompc=<optimised out>, selfpc=93824992383630) at
mcount.c:98
#1 0x00007ffff6e15e24 in mcount () at ../sysdeps/x86_64/_mcount.S:51
#2 0x000055555557928e in qemu_event_reset (ev=0x3cc692b8d452f400) at
util/qemu-thread-posix.c:408
#3 0x000055555557c306 in call_rcu_thread (opaque=0x0) at util/rcu.c:255
#4 0x0000555555579630 in qemu_thread_start (args=0x555555808080) at
util/qemu-thread-posix.c:502
#5 0x00007ffff70e96db in start_thread (arg=0x7ffff6a7e700) at
pthread_create.c:463
#6 0x00007ffff6e1288f in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb) thread 2
[Switching to thread 2 (Thread 0x7ffff7fbe780 (LWP 15472))]
#0 0x00005555555b8d50 in gcov_read_words ()
(gdb) bt
#0 0x00005555555b8d50 in gcov_read_words ()
#1 0x00005555555b9453 in __gcov_read_summary ()
#2 0x00005555555ba461 in gcov_do_dump ()
#3 0x00005555555bab62 in __gcov_exit ()
#4 0x00005555555b8c22 in _GLOBAL__sub_D_00100_1_json_lexer_init () at
qobject/json-lexer.c:365
#5 0x00007ffff7de5b73 in _dl_fini () at dl-fini.c:138
#6 0x00007ffff6d34041 in __run_exit_handlers (status=0, listp=0x7ffff70dc718
<__exit_funcs>, address@hidden, address@hidden) at
exit.c:108
#7 0x00007ffff6d3413a in __GI_exit (status=<optimised out>) at exit.c:139
#8 0x00007ffff6d12b9e in __libc_start_main (main=0x555555575d50 <main>,
argc=11, argv=0x7fffffffdf78, init=<optimised out>, fini=<optimised out>,
rtld_fini=<optimised out>, stack_end=0x7fffffffdf68) at ../csu/libc-start.c:344
#9 0x0000555555573c1a in _start ()
To trigger the second failure I had to run on a limited Xenial machine
(16.04, 2 cores, 8Gb RAM) again with gprof build:
# QEMU configure log Thu 10 Jan 22:22:52 GMT 2019
# Configured with: './configure' '--enable-gprof' '--enable-gcov'
'--disable-pie'
'--target-list=aarch64-softmmu,arm-softmmu,i386-softmmu,mips-softmmu,mips64-softmmu,ppc64-softmmu,riscv64-softmmu,s390x-softmmu,x86_64-softmmu'
Running it like Travis does:
retry.py -n 40 --invert -- make -j 3 check V=1
It eventually fails with:
PASS: tests/test-hmp
make: write error: stdout
It's hard to tell from the output what was running that failed. So far
I've managed to get the following information out of execsnoop:
qemu-system-x86 1345 1332 0 x86_64-softmmu/qemu-system-x86_64 -qtest
unix:/tmp/qtest-1332.sock,nowait -qtest-log /dev/null -chardev
socket,path=/tmp/qtest-1332.qmp,no
wait,id=char0 -mon chardev=char0,mode=control -machine accel=qtest -display
none -S -M pc-i440fx-4.0
sh 1350 1332 0 /bin/sh -c exec
x86_64-softmmu/qemu-system-x86_64 -qtest unix:/tmp/qtest-1332.sock,nowait
-qtest-log /dev/null -chardev socket,path=/tmp/q
t
qemu-system-x86 1350 1332 0 x86_64-softmmu/qemu-system-x86_64 -qtest
unix:/tmp/qtest-1332.sock,nowait -qtest-log /dev/null -chardev
socket,path=/tmp/qtest-1332.qmp,no
wait,id=char0 -mon chardev=char0,mode=control -machine accel=qtest -display
none -S -M pc-q35-3.1
sh 1356 1332 0 /bin/sh -c exec
x86_64-softmmu/qemu-system-x86_64 -qtest unix:/tmp/qtest-1332.sock,nowait
-qtest-log /dev/null -chardev socket,path=/tmp/q
t
qemu-system-x86 1356 1332 0 x86_64-softmmu/qemu-system-x86_64 -qtest
unix:/tmp/qtest-1332.sock,nowait -qtest-log /dev/null -chardev
socket,path=/tmp/qtest-1332.qmp,no
wait,id=char0 -mon chardev=char0,mode=control -machine accel=qtest -display
none -S -M pc-i440fx-3.1
sh 1361 1332 0 /bin/sh -c exec
x86_64-softmmu/qemu-system-x86_64 -qtest unix:/tmp/qtest-1332.sock,nowait
-qtest-log /dev/null -chardev socket,path=/tmp/q
t
qemu-system-x86 1361 1332 0 x86_64-softmmu/qemu-system-x86_64 -qtest
unix:/tmp/qtest-1332.sock,nowait -qtest-log /dev/null -chardev
socket,path=/tmp/qtest-1332.qmp,no
wait,id=char0 -mon chardev=char0,mode=control -machine accel=qtest -display
none -S -M pc-q35-4.0
sh 1366 1332 0 /bin/sh -c exec
x86_64-softmmu/qemu-system-x86_64 -qtest unix:/tmp/qtest-1332.sock,nowait
-qtest-log /dev/null -chardev socket,path=/tmp/q
t
qemu-system-x86 1366 1332 0 x86_64-softmmu/qemu-system-x86_64 -qtest
unix:/tmp/qtest-1332.sock,nowait -qtest-log /dev/null -chardev
socket,path=/tmp/qtest-1332.qmp,no
wait,id=char0 -mon chardev=char0,mode=control -machine accel=qtest -display
none -S -M none -m 2
tset 1370 2129 0 /usr/bin/tset -c
git 1374 1373 0 /usr/bin/git rev-parse --git-dir
git 1376 1375 0 /usr/bin/git symbolic-ref HEAD
git 1378 1377 0 /usr/bin/git config
branch.testing/next.remote
So even though tests/test-hmp has nominally passed I think it has to be
one of the x86_64 tests unless there is something that was triggered a
lot longer ago finally crashing out somehow.
I did run the make under strace to see who is using O_NONBLOCK but even
after filtering out a bunch of stuff it seems to be quite embedded:
ag "O_NONBLOCK" check.strace | ag -v "/sys" | ag -v "/dev" | grep -v ".git" |
wc -l
2449
Anyway I thought it would be worth dumping my current debug state to the
list in case anyone has any bright ideas about whats going on or fancies
some weekend debugging.
The retry.py is just a hacky script I use, I'm sure everyone else has
something similar. See https://github.com/stsquad/retry
--
Alex Bennée
- [Qemu-devel] Remaining CI failures,
Alex Bennée <=