[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 00/14] migration/multifd: Refactor ->send_prepare() and clean
|
From: |
Fabiano Rosas |
|
Subject: |
Re: [PATCH 00/14] migration/multifd: Refactor ->send_prepare() and cleanups |
|
Date: |
Wed, 31 Jan 2024 19:49:51 -0300 |
peterx@redhat.com writes:
> From: Peter Xu <peterx@redhat.com>
>
> This patchset contains quite a few refactorings to current multifd:
>
> - It picked up some patches from an old series of mine [0] (the last
> patches were dropped, though; I did the cleanup slightly differently):
>
> I still managed to include one patch to split pending_job, but I
> rewrote the patch here.
>
> - It tries to cleanup multiple multifd paths here and there, the ultimate
> goal is to redefine send_prepare() to be something like:
>
> p->pages -----------> send_prepare() -------------> IOVs
>
> So that there's no obvious change yet on multifd_ops besides redefined
> interface for send_prepare(). We may want a separate OPs for file
> later.
>
> For 2), one benefit is already presented by Fabiano in his other series [1]
> on cleaning up zero copy, but this patchset addressed it quite differently,
> and hopefully also more gradually. The other benefit is for sure if we
> have a more concrete API for send_prepare() and if we can reach an initial
> consensus, then we can have the recent compression accelerators rebased on
> top of this one.
>
> This also prepares for the case where the input can be extended to even not
> any p->pages, but arbitrary data (like VFIO's potential use case in the
> future?). But that will also for later even if reasonable.
>
> Please have a look. Thanks,
>
> [0] https://lore.kernel.org/r/20231022201211.452861-1-peterx@redhat.com
> [1] 20240126221943.26628-1-farosas@suse.de">https://lore.kernel.org/qemu-devel/20240126221943.26628-1-farosas@suse.de
>
> Peter Xu (14):
> migration/multifd: Drop stale comment for multifd zero copy
> migration/multifd: multifd_send_kick_main()
> migration/multifd: Drop MultiFDSendParams.quit, cleanup error paths
> migration/multifd: Postpone reset of MultiFDPages_t
> migration/multifd: Drop MultiFDSendParams.normal[] array
> migration/multifd: Separate SYNC request with normal jobs
> migration/multifd: Simplify locking in sender thread
> migration/multifd: Drop pages->num check in sender thread
> migration/multifd: Rename p->num_packets and clean it up
> migration/multifd: Move total_normal_pages accounting
> migration/multifd: Move trace_multifd_send|recv()
> migration/multifd: multifd_send_prepare_header()
> migration/multifd: Move header prepare/fill into send_prepare()
> migration/multifd: Forbid spurious wakeups
>
> migration/multifd.h | 34 +++--
> migration/multifd-zlib.c | 11 +-
> migration/multifd-zstd.c | 11 +-
> migration/multifd.c | 291 +++++++++++++++++++--------------------
> 4 files changed, 182 insertions(+), 165 deletions(-)
This series didn't survive my 9999 iterations test on the opensuse
machine.
# Running /x86_64/migration/multifd/tcp/tls/x509/reject-anon-client
...
kill_qemu() detected QEMU death from signal 11 (Segmentation fault) (core
dumped)
#0 0x00005575dda06399 in qemu_mutex_lock_impl (mutex=0x18, file=0x5575ddce9cc3
"../util/qemu-thread-posix.c", line=275) at ../util/qemu-thread-posix.c:92
#1 0x00005575dda06a94 in qemu_sem_post (sem=0x18) at
../util/qemu-thread-posix.c:275
#2 0x00005575dd56a512 in multifd_send_thread (opaque=0x5575df054ef8) at
../migration/multifd.c:720
#3 0x00005575dda0709b in qemu_thread_start (args=0x7fd404001d50) at
../util/qemu-thread-posix.c:541
#4 0x00007fd45e8a26ea in start_thread (arg=0x7fd3faffd700) at
pthread_create.c:477
#5 0x00007fd45cd2150f in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:95
The multifd thread is posting channels_ready with an already freed
multifd_send_state.
This is the bug Avihai has hit. We're going into multifd_save_cleanup()
so early that multifd_new_send_channel_async() hasn't even had the
chance to set p->running. So it misses the join and frees everything up
while a second multifd thread is just starting.
- [PATCH 11/14] migration/multifd: Move trace_multifd_send|recv(), (continued)
- [PATCH 11/14] migration/multifd: Move trace_multifd_send|recv(), peterx, 2024/01/31
- [PATCH 09/14] migration/multifd: Rename p->num_packets and clean it up, peterx, 2024/01/31
- [PATCH 12/14] migration/multifd: multifd_send_prepare_header(), peterx, 2024/01/31
- [PATCH 13/14] migration/multifd: Move header prepare/fill into send_prepare(), peterx, 2024/01/31
- [PATCH 14/14] migration/multifd: Forbid spurious wakeups, peterx, 2024/01/31
- Re: [PATCH 00/14] migration/multifd: Refactor ->send_prepare() and cleanups,
Fabiano Rosas <=