[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [virtio-comment] Re: [PATCH v1] docs/vhost-user: extend
From: |
Wei Wang |
Subject: |
Re: [Qemu-devel] [virtio-comment] Re: [PATCH v1] docs/vhost-user: extend the vhost-user protocol to support the vhost-pci based inter-vm communication |
Date: |
Wed, 09 Nov 2016 16:32:16 +0800 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20131028 Thunderbird/17.0.10 |
On 11/08/2016 08:17 PM, Marc-André Lureau wrote:
>
> Message Specification
> ---------------------
>
> Note that all numbers are in the machine native byte order. A
> vhost-user message
> -consists of 3 header fields and a payload:
> +consists of 4 header fields and a payload:
>
> -------------------------------------
> -| request | flags | size | payload |
> -------------------------------------
> +----------------------------------------------
> +| request | flags | conn_id | size | payload |
> +----------------------------------------------
>
> * Request: 32-bit type of the request
> * Flags: 32-bit bit field:
> - Lower 2 bits are the version (currently 0x01)
> - - Bit 2 is the reply flag - needs to be sent on each reply
> from the slave
> + - Bit 2 is the reply flag - needs to be sent on each reply
> - Bit 3 is the need_reply flag - see
> VHOST_USER_PROTOCOL_F_REPLY_ACK for
> details.
> + * Conn_id: 64-bit connection id to indentify a client socket
> connection. It is
> + introduced in version 0x02 to support the
> "1-server-N-client" model
> + and an asynchronous client read implementation. The
> connection id,
> + 0xFFFFFFFFFFFFFFFF, is used by an anonymous client
> (e.g. a client who
> + has not got its connection id from the server
in the
> initial talk)
>
>
> I don't understand why you need a connection id, on each message.
> What's the purpose? Since the communication is unicast, a single
> message should be enough.
Sure, please let me explain more:
The QEMU socket is going to be upgraded to support 1 server socket
being
connected by multiple client sockets (I've made patches to achieve
this). In other words, here, multiple masters will connect to one
slave,
and the slave creates a vhost-pci device for each master after
receiving
the necessary message info. The slave needs to know which master it is
talking to when receiving a message, as it maintains multiple
connections at the same time.
You should be able to identify each connection in the slave (as a
socket server), without a need for connection id: connected sockets
are independent from each others.
Yes, that's doable. But why couldn't we do it from the protocol layer? I
think it will be easier.
Please check below my thoughts about the implementation if we do it in
the slave:
The interface for receiving a msg is - tcp_chr_read(QIOChannel *chan,
GIOCondition cond, void *opaque)
QIOChannel is the one that we can use to identify the master connection
who sends this msg (the socket server now has an array of QIOChannel,
ioc[MAX_CLIENTS]). Everytime a msg is received, the tcp_chr_read() needs
to compare *chan and the ioc[] array, to find out the id (indexed into
the ioc[]), and passes the id to qemu_chr_be_write(), and all the way
down to the final slave handler where the msg is parsed and handled.
This needs modifications to the existing APIs, for example, the
mentioned qemu_chr_be_write() will need one more parameter, "id". This
will not be compatible with the existing implementation, because all
other implementations which invoke qemu_chr_be_write() will need to be
patched to use the new qemu_chr_be_write(,"id",).
> * Size - 32-bit size of the payload
>
>
> @@ -97,6 +106,13 @@ Depending on the request type, payload
can be:
> log offset: offset from start of supplied file descriptor
> where logging starts (i.e. where guest address 0
would be
> logged)
>
> +* Device info
> + --------------------
> + | virito id | uuid |
> + --------------------
> + Virtio id: 16-bit virtio id of the device
> + UUID: 128-bit UUID to identify the QEMU instance that
creates
> the device
> +
>
>
> I wonder if UUID should be a different message.
>
We can make uuid another message if it has other usages.
Do you see any other usages of uuid?
Allows to associate data/configuration with a particular VM, in a
multi-master/single-slave scenario. But tbh, I don't see how this is
necessary, I can imagine solving this differently (having different
connection address per vm for ex).
Using connection addresses, how could you know if the two connections
are from the same VM?
I would like to understand your use case.
Here is an example of the use case:
VM1 has two master connections (connection X and Y) and VM2 has 1 master
connection (Z).
X,Y,Z - each has a connection id. But X and Y send the same uuid, uuid1,
to the slave, and Z sends uuid2 to the slave. In this way, the slave
know X and Y are the two connections from the same VM, and Z is a
connection from a different VM.
For connection Y, the vhost-pci device will be created in a way which
does not need the driver to map the memory, since it has already been
mapped by device X from the same VM.
> [ Also see the section on REPLY_ACK protocol extension. ]
>
> +Currently, the communication also supports the Slave (server)
> sending messages
> +to the Master (client). Here is a list of them:
> + * VHOST_USER_SET_FEATURES
>
> + * VHOST_USER_SET_PEER_CONNECTION (the serve may actively
request
> to disconnect
> + with the client)
>
>
> Oh, you are making the communication bidirectional? This is a
> fundamental change in the protocol. This may be difficult to
implement
> in qemu, since the communication in synchronous, a request
expects an
> immediate reply, if it gets back a request (from the slave) in the
> middle, it will fail.
>
Not really.
Adding the above two doesn't affect the existing synchronous read()
messages (basically, those VHOST_USER_GET_xx messages). Like
VHOST_USER_SET_FEATURES, the _SET_ messages don't need a reply.
Here, we
just make the slave capable of actively sending messages to the
master.
Yes, that's the trouble. At any time the Master may send a request and
expects an immediate reply. There is a race of getting a request from
the Slave in the middle with your proposed change. I'd rather avoid
making the request bidirectionnal if possible. (I proposed a second
channel for Slave->Master request in the past:
https://lists.gnu.org/archive/html/qemu-devel/2016-04/msg00095.html)
If the message that the slave got has a different "request" field
value, it simply drops it and re-read again. The implementation is not
complex also, please see the change example to vhost_user_get_u64() below:
if (vhost_user_write(dev, &msg_w, NULL, 0) < 0) {
return -1;
}
retry:
if (vhost_user_read(dev, &msg_r) < 0) {
return -1;
}
if (msg_r.request != msg_w.request)
goto retry;
On the other side, the slave's request to the master is dropped due to
the race. This race can be solved in the protocol layer - let the _SET_
request ask for an ACK, if no ACK is received, re-sent it. Also, this
kind of race should be very rare in real usage.
>
> +This request should be sent only when
VHOST_USER_PROTOCOL_F_VHOST_PCI
> has...
>
> +* VHOST_USER_SET_DEV_INFO
> +
> + Id: 21
> + Equivalent ioctl: N/A
> + Master payload: dev info
> +
> + The client sends the producer device info to the server.
>
>
> "Master sends producer device info to the Slave" works, no?
Yes, it works. The current dev info only contains a "virtio id" field
(assume we'll take uuid out as a separate message), which tells the
slave if it is a net, scsi, console or else. do you see any issue?
>
> Could we guarantee this message is sent before SET_VRING*?
Why do we need to guarantee this?
It would simplify the protocol to have expectations on when messages
come. In particular, an early message with devinfo would allow to
check/pre-configure the Slave for a particular device. Also
VHOST_USER_SET_DEV_INFO should probably be unique (don't allow a
device to be reconfigured)
Yes, it is sent in an early age of the vhost-user protocol interaction.
It's implemented to be sent right after sending the
VHOST_USER_SET_PROTOCOL_FEATURES msg. On the slave side, when it
receives SET_DEV_INFO, it pre-configures the device in a table entry (as
mentioned before, a device will be created from the table entry at a
later stage of the protocol interaction).
I think it should be the implementation logic, like
VHOST_USER_SET_PROTOCOL_FEATURES. why do we need to add a guarantee in
the protocol to specify the order?
>
> + This request should be sent only when
> VHOST_USER_PROTOCOL_F_VHOST_PCI has
> + been negotiated.
> +
>
>
> I think this message could be useful for other purposes than
> vhost-pci, thus I would give it its own flag.
Could you please give an example of other usage? Thanks.
You could have a Slave that implements various devices, and pick the
corresponding one dynamically (we already have implementations for
net/input/gpu/scsi...)
If I understand the example correctly, the various devices still belongs
to the vhost-pci series - in the future we would have vhost-pci-net,
vhost-pci-scsi, vhost-pci-gpu etc. If that's the case, we may still use
the VHOST_PCI flag.
>
> +* VHOST_USER_SET_PEER_CONNECTION
> +
> + Id: 22
> + Equivalent ioctl: N/A
> + Master payload: u64
> +
> + The producer device requests to connect or disconnect to
> the consumer device.
>
>
> producer->Master, consummer->Slave
>
> How does it interact with SET_VRING_ENABLE?
It's independent of SET_VRING_ENABLE:
SET_VRING_ENABLE enables a virtq to be in "active".
SET_PEER_CONNECTION enables the peer (slave or master) device to be in
"active". The driver shouldn't send packets if the device is inactive.
I fail to see the difference with SET_VRING_ENABLE, perhaps someone
more familiar with the protocol could help here.
I'm not sure if another email explaning this was sent out successfully,
repost the explanation here:
The SET_PEER_CONNECTION msg is ued to turn "ON/OFF" the (slave or
master) device connection status. For example, when the master side VM
wants to turn down, the virtio-net driver sets the virtio-net device's
PEER_CONNECTION status to "OFF" - before this happens, the virtio-net
device needs to sync-up with the vhost-pci-net device first, that is,
sending a VHOST_USER_SET_PEER_CONNECTION(cmd=OFF) msg to the master. In
return (not as a synchronous reply, because it has to sync with the
driver to stop using the slave side resource first), the vhost-pci-net
device sends VHOST_USER_SET_PEER_CONNECTION(cmd=OFF) msg to the slave -
this sets the virtio-net device's PEER_CONNECTION status to "OFF" and
then the virtio driver is ready to unload. (same for the vhost-pci-net
driver to unload)
SET_VRING_ENABLE controls the virtq status - the slave should not use
the virtq if it's not enabled by the master. For example, a device may
have 4 vitrqs, if vq[0].enabled==0, then the slave should not use vitrq 0.
>
> + The consumer device may request to disconnect to the producer
> device. This
> + request should be sent only when
> VHOST_USER_PROTOCOL_F_VHOST_PCI has been
> + negotiated.
> + Connection request: If the reply message indicates
> "success", the vhost-pci based
> + inter-VM communication channel has been established.
> + Disconnection request: If the reply message indicates
> "success", the vhost-pci based
> + inter-VM communication channel has been destroyed.
> + #define VHOST_USER_SET_PEER_CONNECTION_F_OFF 0
> + #define VHOST_USER_SET_PEER_CONNECTION_F_ON 1
> +
>
I think it would be better to add one more command here:
#define VHOST_USER_SET_PEER_CONNECTION_F_INIT 2
The master uses this command to tell the slave it's ready to
create the
vhost-pci device. Regarding the implementation, it is put at the
bottom
of vhost_net_start() function (when all the vring info have been sent
and enabled).
Do you have WIP branch for qemu vhost-pci? That could help to
understand the context.
Yes, I can share them.
Best,
Wei
- Re: [Qemu-devel] [PATCH v1] docs/vhost-user: extend the vhost-user protocol to support the vhost-pci based inter-vm communication, Wang, Wei W, 2016/11/07
- Re: [Qemu-devel] [PATCH v1] docs/vhost-user: extend the vhost-user protocol to support the vhost-pci based inter-vm communication, Marc-André Lureau, 2016/11/08
- Re: [Qemu-devel] [virtio-comment] Re: [PATCH v1] docs/vhost-user: extend the vhost-user protocol to support the vhost-pci based inter-vm communication, Wei Wang, 2016/11/08
- Re: [Qemu-devel] [virtio-comment] Re: [PATCH v1] docs/vhost-user: extend the vhost-user protocol to support the vhost-pci based inter-vm communication, Wang, Wei W, 2016/11/08
- Re: [Qemu-devel] [virtio-comment] Re: [PATCH v1] docs/vhost-user: extend the vhost-user protocol to support the vhost-pci based inter-vm communication, Marc-André Lureau, 2016/11/08
- Re: [Qemu-devel] [virtio-comment] Re: [PATCH v1] docs/vhost-user: extend the vhost-user protocol to support the vhost-pci based inter-vm communication,
Wei Wang <=
- Re: [Qemu-devel] [virtio-comment] Re: [PATCH v1] docs/vhost-user: extend the vhost-user protocol to support the vhost-pci based inter-vm communication, Marc-André Lureau, 2016/11/10
- Re: [Qemu-devel] [virtio-comment] Re: [PATCH v1] docs/vhost-user: extend the vhost-user protocol to support the vhost-pci based inter-vm communication, Wei Wang, 2016/11/11
- Re: [Qemu-devel] [virtio-comment] Re: [PATCH v1] docs/vhost-user: extend the vhost-user protocol to support the vhost-pci based inter-vm communication, Michael S. Tsirkin, 2016/11/11