qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: vhost reply_ack negotiation (a.k.a differences in vhost-user behavio


From: Raphael Norwitz
Subject: Re: vhost reply_ack negotiation (a.k.a differences in vhost-user behaviour with libvhost-user and vhost-user-backend.rs)
Date: Fri, 26 Feb 2021 09:58:48 -1000

There are two sets of features being negotiated - virtio and
vhost-user.  Based on what you've posted here, I suspect the
VHOST_USER_F_PROTOCOL_FEATURES virtio feature may not be negotiated by
the backend, preventing the vhost-user protocol feature negotiation
from happening at all. I'm not 100% sure why this would cause QEMU to
assume that REPLY_ACK was negotiated though.

some questions:

On Mon, Feb 22, 2021 at 3:26 AM Alex Bennée <alex.bennee@linaro.org> wrote:
>
>
> Dr. David Alan Gilbert <dgilbert@redhat.com> writes:
>
> > * Alex Bennée (alex.bennee@linaro.org) wrote:
> >> Hi,
> >>
> >> I finally got a chance to get down into the guts of vhost-user while
> >> attempting to port my original C RPMB daemon to Rust using the
> >> vhost-user-backend and related crates. I ended up with this hang during
> >> negotiation:
> >>
> >>   startup
> >>
> >>   vhost_user_write req:1 flags:0x1
> >>   vhost_user_read_start
> >>   vhost_user_read req:1 flags:0x5
> >>   vhost_user_backend_init: we got 170000000
>
> GET_FEATURES

Do we also see a GET_PROTOCOL_FEATURES and a SET_PROTOCOL_FEATURES
message here? If so can you confirm what flags they contained?

vhost-user feature negotiation works as follows (see vhost_user_backend_init()):

err = vhost_user_get_features(dev, &features);
if (err < 0) {
    return err;
}

if (virtio_has_feature(features, VHOST_USER_F_PROTOCOL_FEATURES)) {
    dev->backend_features |= 1ULL << VHOST_USER_F_PROTOCOL_FEATURES;

    err = vhost_user_get_u64(dev, VHOST_USER_GET_PROTOCOL_FEATURES,
                                              &protocol_features);
    if (err < 0) {
        return err;
    }

    dev->protocol_features =
        protocol_features & VHOST_USER_PROTOCOL_FEATURE_MASK;
...

    err = vhost_user_set_protocol_features(dev, dev->protocol_features);
    if (err < 0) {
         return err;
     }
}

So we first get the virtio features and check if the backend
advertises VHOST_USER_F_PROTOCOL_FEATURES. If it does, we proceed to
negotiate vhost-user features, in which case we should see
GET_PROTOCOL_FEATURES and a SET_PROTOCOL_FEATURES. Otherwise it looks
like the function just returns, and we leave the vhost-user features
uninitialized (presumably zeroed out?), and the backend will never
even receive a GET/SET_PROTOCOL_FEATURES.

dev->protocol_features is not touched anywhere else, and, if
VHOST_USER_F_PROTOCOL_FEATURES is negotiated, comes directly to the
backend from the protocol_features the backend &ed with
VHOST_USER_PROTOCOL_FEATURE_MASK. Therefore if
VHOST_USER_F_PROTOCOL_FEATURES is indeed negotiated here I'm not sure
what could cause QEMU to think REPLY_ACK was negotiated while the
backend does not, spare something obvious like the backend mishandling
the GET/SET_PROTOCOL_FEATURES messages. I briefly checked the rustvmm
code for that and didn't see anything obvious.

mst - are backend devices meant to function if
VHOST_USER_F_PROTOCOL_FEATURES is not advertised? Do we know of any
functioning backend which does not advertise this virtio feature? If
not, maybe we consider failing out here?

alex - Are you sure QEMU gets stuck waiting on a reply_ack message,
and not somewhere else in the setup path? I trust a SET_MEM_TABLE
message was actually received by the backend. Did you confirm that
QEMU was indeed stuck waiting for a reply and not somewhere else later
on?

>
> >>   vhost_user_write req:15 flags:0x1
> >>   vhost_user_read_start
> >>   vhost_user_read req:15 flags:0x5
> >>   vhost_user_set_protocol_features: 2008
> >>   vhost_user_write req:16 flags:0x1
> >>   vhost_user_write req:3 flags:0x1
> >>   vhost_user_write req:1 flags:0x1
> >>   vhost_user_read_start
> >>   vhost_user_read req:1 flags:0x5
> >>   vhost_user_write req:13 flags:0x1
> >>
> >>   kernel initialises device
> >>
> >>   virtio_rpmb virtio1: init done!
> >>   vhost_user_write req:13 flags:0x1
> >>   vhost_dev_set_features: 130000000
> >>   vhost_user_set_features: 130000000
>
> SET_FEATURES

This is setting virtio features - should have nothing to do with REPLY_ACK.

>
> >>   vhost_user_write req:2 flags:0x1
> >>   vhost_user_write req:5 flags:0x9
> >>   vhost_user_read_start
> >>
> <snip>
> >>
> >>  - Should QEMU have preserved VhostUserVirtioFeatures::PROTOCOL_FEATURES
> >>    when doing the eventual VHOST_USER_SET_FEATURES reply?
> >>
> >>  - Is vhost.rs being to strict or libvhost-user too lax in interpreting
> >>    the negotiated features before processing the ``need_reply`` [Bit 3]
> >>    field of the messages?
> >
> > I think vhost.rs is being correctly strict - but there would be no harm
> > in it flagging that you'd hit an inconsistency if it finds a need_reply
> > without the feature.
>
> But the feature should have been negotiated. So unless the slave can
> assume it is enabled because it asked I think QEMU is in the wrong by
> not preserving the feature bits in it's SET_FEATURES reply. We just gets
> away with it with libvhostuser being willing to reply anyway.
>
> >
> >>  - are VHOST_USER_SET_MEM_TABLE to VHOST_USER_SET_INFLIGHT_FD included
> >>    in the "list of the ones that do" require replies or do they only
> >>    reply when REPLY_ACK has been negotiated as the ambiguous "seealso::"
> >>    box out seems to imply?
> >
> > set_mem_table gives a reply when postcopy is enabled (and then qemu
> > replies to the reply!) but otherwise doesn't.
> > (Note there's an issue opened for .rs to support ADD_MEM_REGION
> > since it's a lot better than SET_MEM_TABLE which has a fixed size table
> > that's small).
>
> Thanks for the heads up.
>
> >
> > Dave
> >
> >> Currently I have some hacks in:
> >>
> >>   https://github.com/stsquad/vhost/tree/my-hacks
> >>
> >> which gets my daemon booting up to the point we actually need to do a
> >> transaction. However I won't submit a PR until I've worked out exactly
> >> where the problems are.
> >>
> >> --
> >> Alex Bennée
> >>
>
>
> --
> Alex Bennée
>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]