Rich Jones pointed me to questionable behavior in qemu's NBD server
implementation today: qemu advertises a minimum block size of 512 to any
client that promises to honor block sizes, but when serving up a raw
file that is not aligned to a sector boundary, attempting to read that
final portion of the file results in a structured read with two chunks,
the first for the data up to the end of the actual file, and the second
reporting a hole for the rest of the sector. If a client is promising to
obey block sizes on its requests, it seems odd that the server is
allowed to send a result that is not also aligned to block sizes.
Right now, the NBD spec says that when structured replies are in use,
then for a structured read:
The server MAY split the reply into any number of content chunks;
each chunk MUST describe at least one byte, although to minimize
overhead, the server SHOULD use chunks with lengths and offsets as
an integer multiple of 512 bytes, where possible (the first and
last chunk of an unaligned read being the most obvious places for
an exception).
I'm wondering if we should tighten that to require that the server
partition the reply chunks to be aligned to the advertised minimum block
size (at which point, qemu should either advertise 1 instead of 512 as
the minimum size when serving up an unaligned file, or else qemu should
just send the final partial sector as a single data chunk rather than
trying to report the last few bytes as a hole).
For comparison, on block status, we require:
The server SHOULD use descriptor
lengths that are an integer multiple of 512 bytes where possible
(the first and last descriptor of an unaligned query being the
most obvious places for an exception), and MUST use descriptor
lengths that are an integer multiple of any advertised minimum
block size.
And qemu as a client currently hangs up on any server that violates that
requirement on block status (that is, when qemu as the server tries to
send a block status that was not aligned to the advertised block size,
qemu as the client flags it as an invalid server - which means qemu as
server is currently broken). So I'm thinking we should copy that
requirement onto servers for reads as well.