qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [PATCH v5] introduce vfio-user protocol specification


From: Thanos Makatos
Subject: RE: [PATCH v5] introduce vfio-user protocol specification
Date: Mon, 2 Nov 2020 11:29:23 +0000

> -----Original Message-----
> From: John Levon <levon@movementarian.org>
> Sent: 30 October 2020 17:03
> To: Thanos Makatos <thanos.makatos@nutanix.com>
> Cc: qemu-devel@nongnu.org; benjamin.walker@intel.com; Elena Ufimtseva
> <elena.ufimtseva@oracle.com>; tomassetti.andrea@gmail.com;
> john.g.johnson@oracle.com; jag.raman@oracle.com; Swapnil Ingle
> <swapnil.ingle@nutanix.com>; james.r.harris@intel.com;
> konrad.wilk@oracle.com; yuvalkashtan@gmail.com; dgilbert@redhat.com;
> Raphael Norwitz <raphael.norwitz@nutanix.com>; ismael@linux.com;
> alex.williamson@redhat.com; Kanth.Ghatraju@oracle.com; Stefan Hajnoczi
> <stefanha@redhat.com>; Felipe Franciosi <felipe@nutanix.com>;
> xiuchun.lu@intel.com; Marc-André Lureau
> <marcandre.lureau@redhat.com>; tina.zhang@intel.com;
> changpeng.liu@intel.com
> Subject: Re: [PATCH v5] introduce vfio-user protocol specification
> 
> On Wed, Oct 28, 2020 at 04:41:31PM +0000, Thanos Makatos wrote:
> 
> > FYI here's v5 of the vfio-user protocol, my --cc in git send-email got 
> > messed
> up somehow
> 
> Hi Thanos, this looks great, I just had some minor questions below.
> 
> > Command Concurrency
> > -------------------
> > A client may pipeline multiple commands without waiting for previous
> command
> > replies.  The server will process commands in the order they are received.
> > A consequence of this is if a client issues a command with the *No_reply*
> bit,
> > then subseqently issues a command without *No_reply*, the older
> command will
> > have been processed before the reply to the younger command is sent by
> the
> > server.  The client must be aware of the device's capability to process
> concurrent
> > commands if pipelining is used.  For example, pipelining allows multiple
> client
> > threads to concurently access device memory; the client must ensure
> these acceses
> > obey device semantics.
> >
> > An example is a frame buffer device, where the device may allow
> concurrent access
> > to different areas of video memory, but may have indeterminate behavior
> if concurrent
> > acceses are performed to command or status registers.
> 
> Is it valid for an unrelated server->client message to appear in between a
> client->server request/reply, or not? And vice versa? Either way, seems
> useful
> for the spec to say.

Yes, it's valid. I don't see a reason why it shouldn't. I'll update the text to
make it explicit.

> 
> > |                | +-----+------------+ |
> > |                | | Bit | Definition | |
> > |                | +=====+============+ |
> > |                | | 0-3 | Type       | |
> > |                | +-----+------------+ |
> > |                | | 4   | No_reply   | |
> > |                | +-----+------------+ |
> > |                | | 5   | Error      | |
> > |                | +-----+------------+ |
> > +----------------+--------+-------------+
> > | Error          | 12     | 4           |
> > +----------------+--------+-------------+
> >
> > * *Message ID* identifies the message, and is echoed in the command's
> reply message.
> 
> Is it valid to re-use an ID? When/when not?

Yes, it's valid to re-use an ID. I suppose it's also valid, though should be
discouraged, to have multiple outstanding requests with the same ID, even though
that probably doesn't make much sense and will most likely break things. The ID
belongs purely to whomever sends the request, the receiver simply echoes it
back in the response and must make no assumptions about its uniqueness. I think
it's simpler to have it this way, otherwise implementations might start abusing
it or rely on it too much. If there's no objection to these semantics I'll
update the text to clarify.


> 
> >   * *Error* in a reply message indicates the command being acknowledged
> had
> >     an error. In this case, the *Error* field will be valid.
> >
> > * *Error* in a reply message is a UNIX errno value. It is reserved in a
> command message.
> 
> I'm not quite following why we need a bit flag and an error field. Do you
> anticipate a failure, but with errno==0?

Indeed, the Error bit seems redundant. John, is there a reason why we need the
error bit?

> 
> > VFIO_USER_VERSION
> > -----------------
> >
> > +--------------+------------------------+
> > | Message size | 16 + version length    |
> 
> Terminating NUL included?

Good point, in the current libmuser implementation we do include the terminating
NUL, however it's not necessary. I don't have a strong opinion on this one, I'll
update the text to include the terminating NUL just to be correct for now,
however if there's a good argument for/against it we should definitely consider
it.

> 
> > +--------------+--------+---------------------------------------------------+
> > | Name         | Type   | Description                                       
> > |
> >
> +==============+========+=================================
> ==================+
> > | version      | object | ``{"major": <number>, "minor": <number>}``        
> > |
> > |              |        |                                                   
> > |
> > |              |        | Version supported by the sender, e.g. "0.1".      
> > |
> 
> It seems quite unlikely but this should specify it's strings not floating 
> point
> values maybe?
> 
> Definitely applies to max_fds too.


major and minor are JSON numbers and specifically integers. The rationale
behind this is to simplify parsing. Is specifying that major/minor/max_fds
should be an interger sufficient to clear any vagueness here?


> 
> > Common capabilities:
> >
> > +---------------+------------------------------------------------------------+
> > | Name          | Description                                               
> >  |
> >
> +===============+=========================================
> ===================+
> > | ``max_fds``   | Maximum number of file descriptors that can be received
> by |
> > |               | the sender. Optional.                                     
> >  |
> 
> Could specify the meaning when absent?

Good point. I suppose the receiver must assume max_fds=1 if it's missing.

> 
> By array I presume you mean associative array i.e. an Object. Does the whole
> thing look like this:
> 
>  {
>    "major": ..
>    "minor": ..
>    "capabilities": {
>       "max_fds": ..,
>       "migration
>    }
>  }
> 
> or something else?

Yes, it's an associative array, and you're correct in your example. I'll update
the text.

> 
> > Versioning and Feature Support
> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > Upon accepting a connection, the server must send a VFIO_USER_VERSION
> message
> > proposing a protocol version and a set of capabilities. The client compares
> > these with the versions and capabilities it supports and sends a
> > VFIO_USER_VERSION reply according to the following rules.
> 
> I'm curious if there was a specific reason it's this way around, when it seems
> more natural for the client to propose first, and the server to reply?

I'm not aware of any specific reason.

> 
> > VFIO_USER_DMA_MAP and VFIO_USER_DMA_UNMAP
> > -----------------------------------------
> 
> Huge nit, but why are these DMA_*MAP when vfio uses *MAP_DMA ?

We discussed this with John privately but I don't have the details, John do
you remember why we decided to name it like that?

> 
> > VFIO bitmap format
> > * *size* the size for the bitmap, in bytes.
> 
> Should this clarify it does *not* include the bitmap header in its size, 
> unlike
> other size fields?

Yes, definitely, I'll update the text.

> 
> > VFIO_USER_DMA_MAP
> > """""""""""""""""
> > If a DMA region being added can be directly mapped by the server, an array
> of
> > file descriptors must be sent as part of the message meta-data. Each region
> > entry must have a corresponding file descriptor.
> 
> "Each mappable region entry" ?

OK.

> 
> > descriptors must be passed as SCM_RIGHTS type ancillary data. Otherwise,
> if a
> > DMA region cannot be directly mapped by the server, it can be accessed by
> the
> > server using VFIO_USER_DMA_READ and VFIO_USER_DMA_WRITE
> messages, explained in
> > `Read and Write Operations`_. A command to map over an existing region
> must be
> > failed by the server with ``EEXIST`` set in error field in the reply.
> >
> > VFIO_USER_DMA_UNMAP
> > """""""""""""""""""
> > Upon receiving a VFIO_USER_DMA_UNMAP command, if the file
> descriptor is mapped
> > then the server must release all references to that DMA region before
> replying,
> > which includes potentially in flight DMA transactions. Removing a portion of
> a
> > DMA region is possible. If the
> VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP bit is set
> > in the request, the server must append to the header the ``struct
> vfio_bitmap``
> > received in the command, followed by the bitmap. Thus, the message size
> the
> > client should is expect is the size of the header plus the size of
> > ``struct vfio_bitmap`` plus ``vfio_bitmap.size`` bytes. Each bit in the 
> > bitmap
> > represents one page of size ``vfio_bitmap.pgsize``.
> 
> I'm finding this makes the sizing a bit confusing between map and unmap,
> could
> we may be separate them out, and always define a vfio_bitmap slot for
> unmap?

We can separate them.

Regarding always defining a vfio_bitmap slot for unmap, we need to consider the
case when a vIOMMU is used, where map/unmap messages will be extremely frequent.
If VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP is not used then the vfio_bitmap slot
will be unused, so some bytes will be wasted. If this doesn't really affect
performance then I suppose we can always including it.
 
> 
> Also, shouldn't the client expect sizeof (header) +
> (nr_table_entries_in_request
> * (each vfio_bitmap's size))  in the server's response?
> 
> Does the reply header size field reflect this?

Correct, I'll update the text.


> 
> > VFIO_USER_DMA_WRITE
> > -------------------
> >
> > This command message is sent from the server to the client to write to
> server
> > memory.
> 
> "write to client memory"?
> 
> > VFIO_USER_DIRY_PAGES
> 
> Nit, "DIRTY"

Correct in both cases, I'll update the text.

> 
> thanks
> john



reply via email to

[Prev in Thread] Current Thread [Next in Thread]