[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 2/2] migration/rdma: zore out head.repeat to make the error m
From: |
Zhijian Li (Fujitsu) |
Subject: |
Re: [PATCH 2/2] migration/rdma: zore out head.repeat to make the error more clear |
Date: |
Thu, 21 Sep 2023 01:36:48 +0000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0 |
On 20/09/2023 21:01, Fabiano Rosas wrote:
> Li Zhijian <lizhijian@fujitsu.com> writes:
>
>> From: Li Zhijian <lizhijian@cn.fujitsu.com>
>>
>> Previously, we got a confusion error that complains
>> the RDMAControlHeader.repeat:
>> qemu-system-x86_64: rdma: Too many requests in this message
>> (3638950032).Bailing.
>>
>> Actually, it's caused by an unexpected RDMAControlHeader.type.
>> After this patch, error will become:
>> qemu-system-x86_64: Unknown control message QEMU FILE
>>
>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>> ---
>> migration/rdma.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/migration/rdma.c b/migration/rdma.c
>> index a2a3db35b1..3073d9953c 100644
>> --- a/migration/rdma.c
>> +++ b/migration/rdma.c
>> @@ -2812,7 +2812,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
>> size_t remaining = iov[i].iov_len;
>> uint8_t * data = (void *)iov[i].iov_base;
>> while (remaining) {
>> - RDMAControlHeader head;
>> + RDMAControlHeader head = {};
>>
>> len = MIN(remaining, RDMA_SEND_INCREMENT);
>> remaining -= len;
>
2815 RDMAControlHeader head = {};
2816
2817 len = MIN(remaining, RDMA_SEND_INCREMENT);
2818 remaining -= len;
2819
2820 head.len = len;
2821 head.type = RDMA_CONTROL_QEMU_FILE;
2822
2823 ret = qemu_rdma_exchange_send(rdma, &head, data, NULL, NULL,
NULL);
> I'm struggling to see how head is used before we set the type a couple
> of lines below. Could you expand on it?
IIUC, head is used for both common migration control path and RDMA specific
control path.
hook_stage(RAM_SAVE_FLAG_HOOK) {
rdma_hook_process(qemu_rdma_registration_handle) {
do {
// this is a RDMA own control block, should not be disturbed by the
common migration control path.
// head will be extracted and processed here.
// qio_channel_rdma_writev() will send RDMA_CONTROL_QEMU_FILE, which
is an unexpected message for this block.
// head.repeat will be examined before the type, so an uninitialized
repeat will confuse us here.
} while (!RDMA_CONTROL_REGISTER_FINISHED || !error)
}
}
when qio_channel_rdma_writev() is used for common migration control path,
repeat is useless and will not be examined.
With this patch, we can quickly know the cause.
>
> Also, a smoke test could have caught both issues early on. Is there any
> reason for not having any?
i have no idea yet :)
Thanks
Zhijian