qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCHv2] Strawman proposal for NBD structured replies


From: Alex Bligh
Subject: Re: [Qemu-devel] [PATCHv2] Strawman proposal for NBD structured replies
Date: Wed, 30 Mar 2016 07:59:15 +0100

On 30 Mar 2016, at 00:17, Eric Blake <address@hidden> wrote:
>> 
>> -The server replies with:
>> +Replies take one of two forms. They may either be structured replies,
> 
> Hmm, you put your strawman directly in the 'transmission phase' section,
> while mine is deferred to the 'Experimental Extensions' section, at
> least until we have a reference client and server implementation.

Yeah, my wording was straw man but I think it should go into the main
body of the work. Obviously in that case it wouldn't be *merged* until
we had a working implementation.

The SELECT stuff is a bit different as I am not sure it was intended
to be standardized short term (i.e. it was a longer term experiment
IIRC).

I guess Wouter should be the arbiter of whether he'd prefer to merge
it only when we have an implementation. My concern is that it may
hang around in 'experimental', when it needs properly merging, which
will require re-wordsmithing.

>> +or unstructured replies. The server MUST NOT send structured replies
>> +unless it has negotiated structured replies with the client using
>> +`NBD_OPT_STUCTURED_REPLIES` (??).
>> +
>> +[Option #1: Subject to that, the server may choose whether it sends
>> +any given reply to any given command as a structured reply or an
>> +unstructured reply.]
> 
> Existing clients are not expecting a structured reply, so at least
> SOMETHING must be negotiated before allowing any structured reply.

Agree, hence the 'subject to that' and the previous sentence.

>  But
> seems a bit awkward to force a client to recognize two different
> possible replies to any given command that does not carry data.

A client that supports both types of reply will have routines to
process both anyway. It just means that it must have both available
for both types of reply.

>> +
>> +[Option #2: If this option is negotiated, the server MUST send all
>> +replies as structured replies. If the option is not negotiated, the
>> +server MUST send all replies as unstructured replies.]
> 
> Doable, but slightly more traffic on the wire.  In my v2, that would
> mean most other commands reply with NBD_REPLY_TYPE_NONE on success, or
> NBD_REPLY_TYPE_ERROR on failure.

I think this is now my preference. With your NBD_REPLY_TYPE_ERROR
thing without the error, we are talking 4 bytes of difference I
think.

>> +
>> +[Option #3: If this option is negotiated, the server MUST send all
>> +replies to command that support structured replies as structured
>> +replies (currently `NBD_CMD_READ` only), and all other replies as
>> +unstructured replies. If the option is not negotiated, the server MUST
>> +send all replies as unstructured replies.]
> 
> My v2 went with this approach, for all existing commands, but documented
> that future commands (such as the proposed NBD_CMD_GET_LBA_STATUS, or
> whatever we name it) may be structured-only.
> 
> I also gave the server latitude to reject clients that negotiate a
> future structured-reply command without also negotiating structured
> reads, which I guess would help if we want to go with option #2 instead
> of #3.
> 
>> +#### Structured replies
>> +
>> +A structured reply consists of one or more chunks. The server
>> +MUST send exactly one end chunk (identified by
>> +the chunk type `NBD_CHUNKTYPE_END`), and this MUST be the final
>> +chunk within the reply.
> 
> My v2 went with a flag marking the end packet, rather than a specific
> type.  I actually ended up with 5 defined reply types, where any of them
> could be flagged as the end packet (but where some of them are useful
> only as the end packet).

I think your idea is better here.

>> +
>> +Each chunk consists of the following:
>> +
>> +S: 32 bits, 0x668e33ef, magic (`NBD_STRUCTURED_REPLY_MAGIC`)
> 
> I see you cribbed the number in my original proposal; I got my number
> from bash's $RANDOM; I don't know if you have any underlying scheme for
> other magic numbers in use (such as consecutive HEX digits from pi, or
> something with ASCII meaning, or something in l33t-speak), so I'm fine
> if anyone has rationale for why a better magic number would be desirable.

Yes I did, and no idea!

>> +S: 32 bits, flags (including type)
>> +S: 64 bits, handle
>> +S: 32 bits, payload length
>> +S: (*length* bytes of payload data)
>> +
>> +The flags have the following meanings:
>> +
>> +* bits 0-7: `NBD_CHUNKTYPE`, an 8 bit unsigned integer
>> +* bits 8-31: reserved (server MUST set these to zero)
> 
> My v2 separates the flag and type fields.

Yeah I proposed that later on list. I see you went with 16/16. All
fine by me.

>> +
>> +Possible values of `NBD_CHUNKTYPE` are as follows:
>> +
>> +* 0 = `NBD_CHUNKTYPE_END`: the final chunk
>> +* 1 = `NBD_CHUNKTYPE_DATA`: data that has been read
>> +* 2 = `NBD_CHUNKTYPE_ZERO`: data that should be considered zero
>> +
>> +The format of the payload data for each chunk type is as follows:
>> +
>> +##### `NBD_CHUNKTYPE_END`
>> +
>> +S: 32 bits, error code or zero for success
>> +S: 64 bits, offset of error (if any)
>> +
>> +##### `NBD_CHUNKTYPE_DATA`
>> +
>> +S: 64 bits, offset of data
>> +S: (*length-8* bytes of data as read)
>> +
>> +##### `NBD_CHUNKTYPE_ZERO`
>> +
>> +S: 64 bits, offset of data
>> +S: 32 bits, number of zeroes to which this corresponds
> 
> Structure wise, our reply types look similar, even though the naming is
> different.  I also had two more types (no data, used for success, and a
> distinct error without offset type, rather than special-casing -1 as
> indeterminate offset).

Yours is better.

>> -Replies need not be sent in the same order as requests (i.e., requests
>> -may be handled by the server asynchronously).
>> +The server MUST NOT send chunks that overlap. The server
>> +MUST NOT send chunks whose data exceeds the length
>> +of data requested (for this purpose counting the data
>> +within `NBD_CHUNKTYPE_ZERO` as the number of zero bytes
>> +specified therein). The server MUST, in the case of a successesful
> 
> successful
> 
>> +read send exactly the number of bytes requested (whether
>> +represented by `NBD_CHUNKTYPE_DATA` or `NBD_CHUNKTYPE_ZERO`).
>> +The server MUST NOT, in the case of an errored read, send
>> +more than the number of bytes requested.
>> +
>> +In order to avoid the burden of reassembly, the client
>> +MAY send `NBD_CMD_FLAG_DF`, which instructs the server
>> +not to fragment the reply. If this flag is set, the server
>> +MUST send either zero or one data chunks and an `NBD_CHUNKTYPE_END`
>> +only. Under such circumstances the server MAY error the command
>> +with `ETOOBIG` if the length read exceeds [65,536 bytes | the
> 
> `ETOOBIG` is not a standard error; my v2 went with the POSIX EOVERFLOW
> and defined it to it's Linux value of 75.

Yours is better.

>> +
>> +If more than one data chunk containing an error has been transmitted
>> +prior to sending the `NBD_CHUNKTYPE_END`, the server MUST take
>> +the second option above, to avoid the client assuming that data
>> +chunks which do not contain the offset marked as errored are
>> +error-free.
> 
> My v2 lets the server send multiple error-with-offset packets in the
> reply chain.

This is also better.

--
Alex Bligh




Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail


reply via email to

[Prev in Thread] Current Thread [Next in Thread]