Re: [Qemu-devel] [Nbd] [PATCH 2/2] NBD proto: add GET_LBA

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [Nbd] [PATCH 2/2] NBD proto: add GET_LBA_STATUS extensi

From:	Wouter Verhelst
Subject:	Re: [Qemu-devel] [Nbd] [PATCH 2/2] NBD proto: add GET_LBA_STATUS extension
Date:	Fri, 25 Mar 2016 09:49:29 +0100
User-agent:	Mutt/1.5.24 (2015-08-30)

On Thu, Mar 24, 2016 at 04:08:13PM -0600, Eric Blake wrote:
> On 03/23/2016 08:16 AM, Denis V. Lunev wrote:
> > From: Pavel Borzenkov <address@hidden>
> > 
> > With the availability of sparse storage formats, it is often needed to
> > query status of a particular LBA range and read only those blocks of
> > data that are actually present on the block device.
> > 
> > To provide such information, the patch adds GET_LBA_STATUS extension
> > with one new NBD_CMD_GET_LBA_STATUS command.
> > 
> 
> > +* `NBD_CMD_GET_LBA_STATUS` (7)
> > +
> > +    An LBA range status query request. Length and offset define the range
> > +    of interest. The server MUST reply with a reply header, followed
> > +    immediately by the following data:
> > +
> > +      - 32 bits, length of parameter data that follow (unsigned)
> > +      - zero or more LBA status descriptors, each having the following
> > +        structure:
> > +
> > +        * 64 bits, offset (unsigned)
> > +        * 32 bits, length (unsigned)
> > +        * 16 bits, status (unsigned)
> > +
> > +    unless an error condition has occurred.
> 
> To date, only the NBD_CMD_READ command caused the server to send data
> after the handle in its reply.  This would be the second command with a
> data field in the response, but it is returning a variable amount of
> data, not directly tied to the client's length - but at least you did
> make it structured so that the client knows how much to read.  However,
> your patch is incomplete; you'll need to edit the "Transmission" section
> of the document to mention the rules on the server sending data, as the
> existing text would now be wrong:
> 
> > The server replies with:
> > 
> > S: 32 bits, 0x67446698, magic (NBD_REPLY_MAGIC)
> > S: 32 bits, error
> > S: 64 bits, handle
> > S: (length bytes of data if the request is of type NBD_CMD_READ)
> 
> You may also want to add a rule that for all future extensions, any
> command that requires data in the server response (other than the server
> response to NBD_CMD_READ) must include a 32-bit length as the first
> field of its data payload, so that the server reply is always structured.

Right.

> Hmm, I still think it would be hard to write a wireshark dissector for
> server replies, since there is no indication whether a given reply will
> include data or not.

Well, a wireshark dissector already exists. However, it is very old; it
doesn't support the NBD_CMD_TRIM or NBD_CMD_FLUSH commands, it doesn't
support the NBD_CMD_FLAG_FUA option to NBD_CMD_WRITE, and it doesn't
deal with negotiation at all. It was written when newstyle negotiation
didn't exist yet, and scanning for INIT_PASSWD at the time was too hard,
according to the guy who wrote it (don't remember who that was).

> The _client_ knows (well, any well-written client
> that uses a different value for 'handle' for every command sent to the
> server), because it can read the returned 'handle' field to know what
> command it matches to and therefore what data to expect; but a
> third-party observer seeing _just_ the server messages has no idea which
> server responses should have payload.

It can if it knows enough about the protocol, but granted, that makes it
harder for us to extend the protocol without breaking the dissector.

> Scanning the stream and assuming that a new reply starts each time
> NBD_REPLY_MAGIC is encountered is a close approximation, but hits
> false positives if the data being returned for NBD_CMD_READ contains
> the same magic number as part of its contents.

Indeed.

> Obviously, back-compat says we can't change the response to
> NBD_CMD_READ, but that means that a wireshark dissector has to either
> maintain context, or hunt through returned data looking for potential
> magic numbers and possibly hitting false positives.
>
> That said, maybe we want to allow global flag negotiation prior to
> transmission to add a new flag on both server and client side - the
> server would advertise that it is capable of a new reply mode, and the
> client then has to reply that it wants to use the new reply mode; in

Global flag negotiation is already possible. There is a client flags
field, which is sent before option haggling, that could be used for
negotiation of such backwards-incompatible features.

> that mode, all server replies starting with magic 0x67446698
> (NBD_REPLY_MAGIC) will never have a data payload, and all commands that
> cause a reply with payload (both NBD_CMD_READ and the new
> NBD_CMD_GET_LBA_STATUS of this message, or whatever name we give it)
> will reply with a NEW magic number:
> 
> S: 32 bits, XXXX, magic (NBD_REPLY_MAGIC2)
> S: 32 bits, error
> S: 64 bits, handle
> S: 32 bits, length
> S: length bytes of data

I like this. However, before committing to it, I would like to see
Markus' feedback on that (explicit Cc added, even though he's on the
list, too).

We'd also need a way to communicate the ability to speak this protocol
from the kernel to userspace before telling the server that the client
understands something which maybe its kernel doesn't.

Markus?

> so that the server's data stream is fully structured without having to
> maintain context of the client's requests.  That is, a dissector can now
> merely scan for both magic numbers; and on a stream between a client and
> server that have negotiated the new mode, the old magic number will
> never have a payload, and the new magic number will always be
> accompanied with a payload that describes how much data to read to the
> boundary of the next reply.
> 
> For that matter, right now, NBD_CMD_READ is required to either end the
> session or return length bytes of data even when error is non-zero (but
> where those bytes MAY be invalid); but by adding a negotiated flag for
> structured length replies, it would be possible to allow for short reads
> and/or returning an error with 0 bytes of payload but still keeping the
> connection to the client open, without having to send wasted bytes over
> the wire.

Yes. This has been discussed on the nbd-general list in the past. There
is also the (significant) problem of the server having maybe already
sent out the header before discovering there is an error, at which point
it can *only* drop the connection.

> I could write up a negotiation of global flags for structured reply
> lengths as an extension proposal, if you think it is worth it.

I think it is worth it...

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
       people in the world who think they really understand all of its rules,
       and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] [Nbd] [PATCH 2/2] NBD proto: add GET_LBA_STATUS extension, (continued)

Prev by Date: Re: [Qemu-devel] [PATCH 2/2] tap: vhost busy polling support
Next by Date: Re: [Qemu-devel] [Nbd] [PATCH 2/2] NBD proto: add GET_LBA_STATUS extension
Previous by thread: Re: [Qemu-devel] [PATCH 2/2] NBD proto: add GET_LBA_STATUS extension
Next by thread: Re: [Qemu-devel] [Nbd] [PATCH 2/2] NBD proto: add GET_LBA_STATUS extension
Index(es):
- Date
- Thread