[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch6
From: |
Rafael David Tinoco |
Subject: |
[Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images |
Date: |
Tue, 21 Jul 2020 20:02:38 -0000 |
Status from old attempts to solve same nature issues:
----
Older (2018) merge request from @raharper:
https://github.com/koverstreet/bcache-tools/pull/1
addressing the fact that kernel uevents would not always emit
CACHED_UUID parameters, making udev to delete (whenever that happens)
/dev/bcache/{by-uuid,by-label} symlinks.
This last MR pointed to previous related bugs:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=890446
https://bugs.launchpad.net/curtin/+bug/1728742
And to an upstream kernel patch:
https://lore.kernel.org/patchwork/patch/921298/
to
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1729145
that wasn't accepted upstream.
Even not being accepted upstream, the SRU was attempted:
LP: #1729145
https://lists.ubuntu.com/archives/kernel-team/2017-December/088680.html
https://lists.ubuntu.com/archives/kernel-team/2017-December/088679.html
Both were NACKED.
Attempted again:
https://lists.ubuntu.com/archives/kernel-team/2017-December/088682.html
https://lists.ubuntu.com/archives/kernel-team/2017-December/088683.html
NACKED again.
And a v2 was sent:
https://lists.ubuntu.com/archives/kernel-team/2017-December/088751.html
https://lists.ubuntu.com/archives/kernel-team/2017-December/088750.html
https://lists.ubuntu.com/archives/kernel-team/2017-December/088749.html
and acked in January 2018 by Coling:
https://lists.ubuntu.com/archives/kernel-team/2018-January/089492.html
but not upstreamed.
BIONIC contains the fix:
commit ed9333e1b583
Author: Ryan Harper <ryan.harper@canonical.com>
Date: Mon Dec 11 12:12:01 2017
UBUNTU: SAUCE: (no-up) bcache: decouple emitting a cached_dev CHANGE uevent
BugLink: http://bugs.launchpad.net/bugs/1729145
- decouple emitting a cached_dev CHANGE uevent which includes dev.uuid
and dev.label from bch_cached_dev_run() which only happens when a
bcacheX device is bound to the actual backing block device (bcache0 ->
vdb)
- update bch_cached_dev_run() to invoke bch_cached_dev_emit_change() as
needed; no functional code path changes here
- Modify register_bcache to detect a re-registering of a bcache
cached_dev, and in that case call bcache_cached_dev_emit_change() to
Signed-off-by: Ryan Harper <ryan.harper@canonical.com>
Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Acked-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
[ saf: fix incorrect indentation ]
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
FOCAL contains the fix:
commit 67553dcd7905
Author: Ryan Harper <ryan.harper@canonical.com>
Date: Mon Dec 11 12:12:01 2017
UBUNTU: SAUCE: (no-up) bcache: decouple emitting a cached_dev CHANGE
uevent
GROOVY contains the fix:
commit 67553dcd7905
Author: Ryan Harper <ryan.harper@canonical.com>
Date: Mon Dec 11 12:12:01 2017
UBUNTU: SAUCE: (no-up) bcache: decouple emitting a cached_dev CHANGE
uevent
----
So, the kernel patch wasn't accepted, nor bcache-tools patch by
@raharper, the bcache-export-cached.
----
New Upstream summary from @raharper:
https://github.com/systemd/systemd/pull/16317#issuecomment-655647313
in the upstream merge request made by @rbalint.
** Bug watch added: Debian Bug tracker #890446
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=890446
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1805256
Title:
qemu-img hangs on rcu_call_ready_event logic in Aarch64 when
converting images
Status in kunpeng920:
Triaged
Status in kunpeng920 ubuntu-18.04 series:
Triaged
Status in kunpeng920 ubuntu-18.04-hwe series:
Triaged
Status in kunpeng920 ubuntu-19.10 series:
Fix Released
Status in kunpeng920 ubuntu-20.04 series:
Fix Released
Status in kunpeng920 upstream-kernel series:
Invalid
Status in QEMU:
Fix Released
Status in qemu package in Ubuntu:
Fix Released
Status in qemu source package in Bionic:
In Progress
Status in qemu source package in Eoan:
Fix Released
Status in qemu source package in Focal:
Fix Released
Bug description:
[Impact]
* QEMU locking primitives might face a race condition in QEMU Async
I/O bottom halves scheduling. This leads to a dead lock making either
QEMU or one of its tools to hang indefinitely.
[Test Case]
* qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2
Hangs indefinitely approximately 30% of the runs in Aarch64.
[Regression Potential]
* This is a change to a core part of QEMU: The AIO scheduling. It
works like a "kernel" scheduler, whereas kernel schedules OS tasks,
the QEMU AIO code is responsible to schedule QEMU coroutines or event
listeners callbacks.
* There was a long discussion upstream about primitives and Aarch64.
After quite sometime Paolo released this patch and it solves the
issue. Tested platforms were: amd64 and aarch64 based on his commit
log.
* Christian suggests that this fix stay little longer in -proposed to
make sure it won't cause any regressions.
* dannf suggests we also check for performance regressions; e.g. how
long it takes to convert a cloud image on high-core systems.
[Other Info]
* Original Description bellow:
Command:
qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2
Hangs indefinitely approximately 30% of the runs.
----
Workaround:
qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2
Run "qemu-img convert" with "a single coroutine" to avoid this issue.
----
(gdb) thread 1
...
(gdb) bt
#0 0x0000ffffbf1ad81c in __GI_ppoll
#1 0x0000aaaaaabcf73c in ppoll
#2 qemu_poll_ns
#3 0x0000aaaaaabd0764 in os_host_main_loop_wait
#4 main_loop_wait
...
(gdb) thread 2
...
(gdb) bt
#0 syscall ()
#1 0x0000aaaaaabd41cc in qemu_futex_wait
#2 qemu_event_wait (ev=ev@entry=0xaaaaaac86ce8 <rcu_call_ready_event>)
#3 0x0000aaaaaabed05c in call_rcu_thread
#4 0x0000aaaaaabd34c8 in qemu_thread_start
#5 0x0000ffffbf25c880 in start_thread
#6 0x0000ffffbf1b6b9c in thread_start ()
(gdb) thread 3
...
(gdb) bt
#0 0x0000ffffbf11aa20 in __GI___sigtimedwait
#1 0x0000ffffbf2671b4 in __sigwait
#2 0x0000aaaaaabd1ddc in sigwait_compat
#3 0x0000aaaaaabd34c8 in qemu_thread_start
#4 0x0000ffffbf25c880 in start_thread
#5 0x0000ffffbf1b6b9c in thread_start
----
(gdb) run
Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2
./disk01.ext4.qcow2 ./output.qcow2
[New Thread 0xffffbec5ad90 (LWP 72839)]
[New Thread 0xffffbe459d90 (LWP 72840)]
[New Thread 0xffffbdb57d90 (LWP 72841)]
[New Thread 0xffffacac9d90 (LWP 72859)]
[New Thread 0xffffa7ffed90 (LWP 72860)]
[New Thread 0xffffa77fdd90 (LWP 72861)]
[New Thread 0xffffa6ffcd90 (LWP 72862)]
[New Thread 0xffffa67fbd90 (LWP 72863)]
[New Thread 0xffffa5ffad90 (LWP 72864)]
[Thread 0xffffa5ffad90 (LWP 72864) exited]
[Thread 0xffffa6ffcd90 (LWP 72862) exited]
[Thread 0xffffa77fdd90 (LWP 72861) exited]
[Thread 0xffffbdb57d90 (LWP 72841) exited]
[Thread 0xffffa67fbd90 (LWP 72863) exited]
[Thread 0xffffacac9d90 (LWP 72859) exited]
[Thread 0xffffa7ffed90 (LWP 72860) exited]
<HUNG w/ 3 threads in the stack trace showed before>
"""
All the tasks left are blocked in a system call, so no task left to call
qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock
thread #1 (doing poll() in a pipe with thread #2).
Those 7 threads exit before disk conversion is complete (sometimes in
the beginning, sometimes at the end).
----
On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img
frequently hangs (~50% of the time) with this command:
qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2
Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This
qcow2->qcow2 conversion happens to be something uvtool does every time
it fetches images.
Once hung, attaching gdb gives the following backtrace:
(gdb) bt
#0 0x0000ffffae4f8154 in __GI_ppoll (fds=0xaaaae8a67dc0,
nfds=187650274213760,
timeout=<optimized out>, timeout@entry=0x0, sigmask=0xffffc123b950)
at ../sysdeps/unix/sysv/linux/ppoll.c:39
#1 0x0000aaaabbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized
out>,
__fds=<optimized out>) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
#2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>,
timeout=timeout@entry=-1) at util/qemu-timer.c:322
#3 0x0000aaaabbefbf80 in os_host_main_loop_wait (timeout=-1)
at util/main-loop.c:233
#4 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:497
#5 0x0000aaaabbe2aa30 in convert_do_copy (s=0xffffc123bb58) at
qemu-img.c:1980
#6 img_convert (argc=<optimized out>, argv=<optimized out>) at
qemu-img.c:2456
#7 0x0000aaaabbe2333c in main (argc=7, argv=<optimized out>) at
qemu-img.c:4975
Reproduced w/ latest QEMU git (@ 53744e0a182)
To manage notifications about this bug go to:
https://bugs.launchpad.net/kunpeng920/+bug/1805256/+subscriptions
- [Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images, Ike Panhc, 2020/07/01
- [Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images, Rafael David Tinoco, 2020/07/12
- [Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images, Launchpad Bug Tracker, 2020/07/13
- [Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images, Rafael David Tinoco, 2020/07/13
- [Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images, dann frazier, 2020/07/15
- [Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images, Rafael David Tinoco, 2020/07/20
- [Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images,
Rafael David Tinoco <=
- [Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images, Rafael David Tinoco, 2020/07/21
- [Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images, Rafael David Tinoco, 2020/07/31
- [Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images, Rafael David Tinoco, 2020/07/31