qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Libguestfs] [PATCH v3 6/6] RFC: spec: Introduce NBD_REPLY_TYPE_OFFS


From: Richard W.M. Jones
Subject: Re: [Libguestfs] [PATCH v3 6/6] RFC: spec: Introduce NBD_REPLY_TYPE_OFFSET_HOLE_EXT
Date: Tue, 18 Apr 2023 15:13:11 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

On Tue, Apr 18, 2023 at 02:33:41PM +0200, Wouter Verhelst wrote:
> On Thu, Apr 13, 2023 at 05:02:41PM -0500, Eric Blake wrote:
> > Rather than requiring all servers and clients to have a 32-bit limit
> > on maximum NBD_CMD_READ/WRITE sizes, we can choose to standardize
> > support for a 64-bit single I/O transaction now.
> > NBD_REPLY_TYPE_OFFSET_DATA can already handle a large reply, but
> > NBD_REPLY_TYPE_OFFSET_HOLE needs a 64-bit counterpart.
> > 
> > By standardizing this, all clients must be prepared to support both
> > types of hole type replies, even though most server implementations of
> > extended replies are likely to only send one hole type.
> 
> I think it's probably a better idea to push this patch to a separate
> "extension-*" branch, and link to that from proto.md on master. Those
> are documented as "we standardized this, but no first implementor exists
> yet".
> 
> If someone actually comes up with a reason for 64-bit transactions, we
> can then see if the spec matches the need and merge it to master.
> 
> Otherwise this feels too much like a solution in search of a problem to
> me.

I'd like to push back a bit on this.  Firstly Eric does have two
complete implementations.  It's true however that they not upstream in
either case.

But we also need this because there are relatively serious issues
observed, particularly around trimming/zeroing, and extents.  The
trimming problem can be demonstrated very easily in fact:

    $ nbdkit -U - --filter=stats memory 1P statsfile=/dev/stdout --run ' time 
guestfish add "" protocol:nbd server:unix:$unixsocket discard:enable format:raw 
: run : mkfs xfs /dev/sda '
    
    real         4m17.531s
    user         0m0.032s
    sys  0m0.040s
    total: 1066328 ops, 257.558068 s, 1048578.04 GiB, 4071.23 GiB/s
    read: 4356 ops, 0.003335 s, 14.61 MiB, 4.28 GiB/s op, 58.08 KiB/s total
      Request size and alignment breakdown:
        12 bits: 50.8% (2215 reqs, 8.65 MiB total)
             12 bit aligned: 100.0% (2215)
             13 bit aligned:  51.6% (1143)
             14 bit aligned:  26.9% (595)
             15 bit aligned:  14.6% (323)
             16 bit aligned:   8.4% (185)
             17+ bit-aligned:  4.9% (109)
         9 bits: 47.4% (2064 reqs, 1.01 MiB total)
              9 bit aligned: 100.0% (2064)
             10+ bit-aligned:  0.6% (13)
        other sizes:  1.8% (77 reqs, 14.61 MiB total)
    
    write: 13325 ops, 0.046782 s, 31.29 MiB, 668.91 MiB/s op, 124.41 KiB/s total
      Request size and alignment breakdown:
        12 bits: 53.8% (7170 reqs, 28.01 MiB total)
             12 bit aligned: 100.0% (7170)
             13 bit aligned:  50.0% (3585)
             14 bit aligned:  25.0% (1793)
             15 bit aligned:  12.5% (896)
             16 bit aligned:   6.2% (448)
             17+ bit-aligned:  3.1% (224)
         9 bits: 46.2% (6150 reqs, 3.00 MiB total)
              9 bit aligned: 100.0% (6150)
             10 bit aligned:  33.4% (2054)
             12 bit aligned:  16.7% (1029)
             13 bit aligned:   8.4% (515)
             14+ bit-aligned:  4.2% (259)
        other sizes:  0.0% (5 reqs, 31.29 MiB total)
    
    trim: 1048576 ops, 306.059735 s, 1048576.00 GiB, 3426.05 GiB/s op, 4071.22 
GiB/s total
      Request size and alignment breakdown:
        30 bits: 100.0% (1048576 reqs, 1048576.00 GiB total)
             30 bit aligned: 100.0% (1048576)
             31 bit aligned:  50.0% (524288)
             32 bit aligned:  25.0% (262144)
             33 bit aligned:  12.5% (131072)
             34 bit aligned:   6.2% (65536)
             35+ bit-aligned:  3.1% (32768)
    
    zero: 64 ops, 0.003912 s, 1.99 GiB, 508.75 GiB/s op, 7.91 MiB/s total
      Request size and alignment breakdown:
        25 bits: 98.4% (63 reqs, 1.97 GiB total)
             13 bit aligned: 100.0% (63)
        other sizes:  1.6% (1 reqs, 1.99 GiB total)
    
    flush: 7 ops, 0.000001 s, 0 bytes, 0 bytes/s op, 0 bytes/s total

Note how trim takes a million operations and most of the time.  That
should be done in one operation.  If you stop advertising discard
support on the disk ("discard:disable") it takes only a fraction of
the time.

The extents one is harder to demonstrate, but it makes our code
considerably more complex that we cannot just grab the extent map for
a whole disk larger than 4GB in a single command.  (The complexity
won't go away, but the querying will be faster with fewer round trips
with this change.)

Nevertheless I'm not opposed to keeping this as an extension until the
implementations are upstream and bedded in.

Rich.


> With that said, for the things I didn't reply to, you can add:
> 
> Reviewed-By: Wouter Verhelst <w@uter.be>
> 
> -- 
>      w@uter.{be,co.za}
> wouter@{grep.be,fosdem.org,debian.org}
> 
> I will have a Tin-Actinium-Potassium mixture, thanks.
> 
> _______________________________________________
> Libguestfs mailing list
> Libguestfs@redhat.com
> https://listman.redhat.com/mailman/listinfo/libguestfs

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v




reply via email to

[Prev in Thread] Current Thread [Next in Thread]