[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 6/6] gitlab-ci.d/buildtest: Disintegrate the build-coroutine-
From: |
Juan Quintela |
Subject: |
Re: [PATCH 6/6] gitlab-ci.d/buildtest: Disintegrate the build-coroutine-sigaltstack job |
Date: |
Mon, 06 Feb 2023 09:46:26 +0100 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) |
Thomas Huth <thuth@redhat.com> wrote:
> On 03/02/2023 22.14, Juan Quintela wrote:
>> Peter Maydell <peter.maydell@linaro.org> wrote:
>>> On Fri, 3 Feb 2023 at 15:44, Thomas Huth <thuth@redhat.com> wrote:
>>>>
>>>> On 03/02/2023 13.08, Kevin Wolf wrote:
>>>>> Am 03.02.2023 um 12:23 hat Thomas Huth geschrieben:
>>>>>> On 30/01/2023 11.58, Daniel P. Berrangé wrote:
>>>>>>> On Mon, Jan 30, 2023 at 11:44:46AM +0100, Thomas Huth wrote:
>>>>>>>> We can get rid of the build-coroutine-sigaltstack job by moving
>>>>>>>> the configure flags that should be tested here to other jobs:
>>>>>>>> Move --with-coroutine=sigaltstack to the build-without-defaults job
>>>>>>>> and --enable-trace-backends=ftrace to the cross-s390x-kvm-only job.
>>>>>>>
>>>>>>> The biggest user of coroutines is the block layer. So we probably
>>>>>>> ought to have coroutines aligned with a job that triggers the
>>>>>>> 'make check-block' for iotests. IIUC, the without-defaults
>>>>>>> job won't do that. How about, arbitrarily, using either the
>>>>>>> 'check-system-debian' or 'check-system-ubuntu' job. Those distros
>>>>>>> are closely related, so getting sigaltstack vs ucontext coverage
>>>>>>> between them is a good win, and they both trigger the block jobs
>>>>>>> IIUC.
>>>>>>
>>>>>> I gave it a try with the ubuntu job, but this apparently trips up the
>>>>>> iotests:
>>>>>>
>>>>>> https://gitlab.com/thuth/qemu/-/jobs/3705965062#L212
>>>>>>
>>>>>> Does anybody have a clue what could be going wrong here?
>>>>>
>>>>> I'm not sure how changing the coroutine backend could cause it, but
>>>>> primarily this looks like an assertion failure in migration code.
>>>>>
>>>>> Dave, Juan, any ideas what this assertion checks and why it could be
>>>>> failing?
>>>>
>>>> Ah, I think it's the bug that will be fixed by:
>>>>
>>>>
>>>> 20230202160640.2300-2-quintela@redhat.com/">https://lore.kernel.org/qemu-devel/20230202160640.2300-2-quintela@redhat.com/
>>>>
>>>> The fix hasn't hit the master branch yet (I think), and I had another patch
>>>> in my CI that disables the aarch64 binary in that runner, so the iotests
>>>> suddenly have been executed with the alpha binary there --> migration
>>>> fails.
>>>>
>>>> So never mind, it will be fixed as soon as Juan's pull request gets
>>>> included.
>>>
>>> The migration tests have been flaky for a while now,
>>> including setups where host and guest page sizes are the same.
>>> (For instance, my x86 macos box pretty reliably sees failures
>>> when the machine is under load.)
>> I *thought* that we had fixed all of those.
>> But it is difficult for me to know because:
>> - I only happens when one runs "make check"
>> - running ./migration-test have never failed to me
>> - When it fails (and it has been a while since it has failed to me)
>> it is impossible to me to detect what is going on, and as said, I have
>> never been able to reproduce running only migration-test.
>> I will try to run several at the same time and see if it happens.
>> And as Thomas said, I *think* that the fix that Peter Xu posted
>> should
>> fix this issue. Famous last words.
>
> The patch from Peter should fix my problems that I triggered via the
> iotests - but the migration-qtest is still unstable independent from
> that issue, I think. See for example the latest staging pipeline:
>
> https://gitlab.com/qemu-project/qemu/-/pipelines/767961842
>
> The migration qtest failed in both, the x86-freebsd-build and the
> ubuntu-20.04-s390x-all pipelin.
>
> Thomas
31/659 qemu:qtest+qtest-aarch64 / qtest-aarch64/migration-test
ERROR 48.23s killed by signal 6 SIGABRT
>>> G_TEST_DBUS_DAEMON=/home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/tests/dbus-vmstate-daemon.sh
>>> QTEST_QEMU_IMG=./qemu-img QTEST_QEMU_BINARY=./qemu-system-aarch64
>>> MALLOC_PERTURB_=124
>>> QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon
>>> /home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/build/tests/qtest/migration-test
>>> --tap -k
――――――――――――――――――――――――――――――――――――― ✀ ―――――――――――――――――――――――――――――――――――――
stderr:
Broken pipe
../tests/qtest/libqtest.c:190: kill_qemu() detected QEMU death from signal 11
(Segmentation fault) (core dumped)
TAP parsing error: Too few tests run (expected 41, got 12)
(test program exited with status code -6)
――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
I don't know hat to do with this:
- this is aarch64 tcg
- this *works* on f37, or at least I can't reproduce any error with make
check on my box, and I *think* my configuration is quite extensive (as
far as I know everything that can be compiled in fedora with packages
in the distro):
configure file: /mnt/code/qemu/full/configure
--enable-trace-backends=log
--prefix=/usr
--sysconfdir=/etc/sysconfig/
--audio-drv-list=pa,alsa
--with-coroutine=ucontext
--with-git-submodules=validate
--enable-alsa
--enable-attr
--enable-auth-pam
--enable-avx2
--enable-avx512f
--enable-bochs
--enable-bpf
--enable-brlapi
--disable-bsd-user
--enable-bzip2
--enable-cap-ng
--enable-capstone
--disable-cfi
--disable-cfi-debug
--enable-cloop
--disable-cocoa
--enable-containers
--disable-coreaudio
--enable-coroutine-pool
--enable-crypto-afalg
--enable-curl
--enable-curses
--enable-dbus-display
--enable-debug-info
--disable-debug-mutex
--disable-debug-stack-usage
--disable-debug-tcg
--enable-dmg
--enable-docs
--disable-dsound
--enable-fdt
--enable-fuse
--enable-fuse-lseek
--disable-fuzzing
--disable-gcov
--disable-gcrypt
--enable-gettext
--enable-gio
--enable-glusterfs
--enable-gnutls
--disable-gprof
--enable-gtk
--enable-guest-agent
--disable-guest-agent-msi
--disable-hax
--disable-hvf
--enable-iconv
--enable-install-blobs
--enable-jack
--enable-keyring
--enable-kvm
--enable-l2tpv3
--enable-libdaxctl
--enable-libiscsi
--enable-libnfs
--enable-libpmem
--enable-libssh
--enable-libudev
--enable-libusb
--enable-linux-aio
--enable-linux-io-uring
--enable-linux-user
--enable-live-block-migration
--disable-lto
--disable-lzfse
--enable-lzo
--disable-malloc-trim
--enable-membarrier
--enable-module-upgrades
--enable-modules
--enable-mpath
--enable-multiprocess
--disable-netmap
--enable-nettle
--enable-numa
--disable-nvmm
--enable-opengl
--enable-oss
--enable-pa
--enable-parallels
--enable-pie
--enable-plugins
--enable-png
--disable-profiler
--enable-pvrdma
--enable-qcow1
--enable-qed
--disable-qom-cast-debug
--enable-rbd
--enable-rdma
--enable-replication
--enable-rng-none
--disable-safe-stack
--disable-sanitizers
--enable-stack-protector
--enable-sdl
--enable-sdl-image
--enable-seccomp
--enable-selinux
--enable-slirp
--enable-slirp-smbd
--enable-smartcard
--enable-snappy
--enable-sparse
--enable-spice
--enable-spice-protocol
--enable-system
--enable-tcg
--disable-tcg-interpreter
--enable-tools
--enable-tpm
--disable-tsan
--disable-u2f
--enable-usb-redir
--enable-user
--disable-vde
--enable-vdi
--enable-vhost-crypto
--enable-vhost-kernel
--enable-vhost-net
--enable-vhost-user
--enable-vhost-user-blk-server
--enable-vhost-vdpa
--enable-virglrenderer
--enable-virtfs
--enable-virtiofsd
--enable-vnc
--enable-vnc-jpeg
--enable-vnc-sasl
--enable-vte
--enable-vvfat
--enable-werror
--disable-whpx
--enable-xen
--enable-xen-pci-passthrough
--enable-xkbcommon
--enable-zstd
- It gives a segmentation fault. Nothing else.
Can we get at least a backtrace to work from there?
Thanks, Juan.
Re: [PATCH 6/6] gitlab-ci.d/buildtest: Disintegrate the build-coroutine-sigaltstack job, Juan Quintela, 2023/02/03