On Fri, Jan 06, 2023 at 03:21:43PM +0100, Laurent Vivier wrote:
Hi,
it seems this patch breaks vhost-user with DPDK.
See https://bugzilla.redhat.com/show_bug.cgi?id=2155173
it seems QEMU doesn't receive the expected commands sequence:
Received unexpected msg type. Expected 22 received 40
Fail to update device iotlb
Received unexpected msg type. Expected 40 received 22
Received unexpected msg type. Expected 22 received 11
Fail to update device iotlb
Received unexpected msg type. Expected 11 received 22
vhost VQ 1 ring restore failed: -71: Protocol error (71)
Received unexpected msg type. Expected 22 received 11
Fail to update device iotlb
Received unexpected msg type. Expected 11 received 22
vhost VQ 0 ring restore failed: -71: Protocol error (71)
unable to start vhost net: 71: falling back on userspace virtio
It receives VHOST_USER_GET_STATUS (40) when it expects VHOST_USER_IOTLB_MSG (22)
and VHOST_USER_IOTLB_MSG when it expects VHOST_USER_GET_STATUS.
and VHOST_USER_GET_VRING_BASE (11) when it expect VHOST_USER_GET_STATUS and so
on.
Any idea?
Thanks,
Laurent
So I am guessing it's coming from:
if (msg.hdr.request != request) {
error_report("Received unexpected msg type. Expected %d received %d",
request, msg.hdr.request);
return -EPROTO;
}
in process_message_reply and/or in vhost_user_get_u64.
On 11/7/22 23:53, Michael S. Tsirkin wrote:
From: Yajun Wu <yajunw@nvidia.com>
The motivation of adding vhost-user vhost_dev_start support is to
improve backend configuration speed and reduce live migration VM
downtime.
Today VQ configuration is issued one by one. For virtio net with
multi-queue support, backend needs to update RSS (Receive side
scaling) on every rx queue enable. Updating RSS is time-consuming
(typical time like 7ms).
Implement already defined vhost status and message in the vhost
specification [1].
(a) VHOST_USER_PROTOCOL_F_STATUS
(b) VHOST_USER_SET_STATUS
(c) VHOST_USER_GET_STATUS
Send message VHOST_USER_SET_STATUS with VIRTIO_CONFIG_S_DRIVER_OK for
device start and reset(0) for device stop.
On reception of the DRIVER_OK message, backend can apply the needed setting
only once (instead of incremental) and also utilize parallelism on enabling
queues.
This improves QEMU's live migration downtime with vhost user backend
implementation by great margin, specially for the large number of VQs of 64
from 800 msec to 250 msec.
[1] https://qemu-project.gitlab.io/qemu/interop/vhost-user.html
Signed-off-by: Yajun Wu <yajunw@nvidia.com>
Acked-by: Parav Pandit <parav@nvidia.com>
Message-Id: <20221017064452.1226514-3-yajunw@nvidia.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Probably easiest to debug from dpdk side.
Does the problem go away if you disable the feature
VHOST_USER_PROTOCOL_F_STATUS in dpdk?