qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] Disconecting /dev/nbdX leaves stale partitions and devi


From: Nir Soffer
Subject: Re: [Qemu-block] Disconecting /dev/nbdX leaves stale partitions and device
Date: Fri, 27 Jul 2018 21:53:32 +0300

On Fri, Jul 27, 2018 at 5:07 PM Richard W.M. Jones <address@hidden> wrote:
On Fri, Jul 27, 2018 at 02:20:29PM +0100, Stefan Hajnoczi wrote:
> On Thu, Jul 19, 2018 at 01:56:34PM +0300, Nir Soffer wrote:
> > Having HTTP API makes it easy to integrate with. This is also the protocol
> > Kubrvirt CDI[1] is using for importing images:
> > https://github.com/kubevirt/containerized-data-importer
>
> 2 options:
>
> 1. Write a server that does HTTP<->NBD.  Less efficient but fairly easy
>    to do in Go, Python, etc.

We did this one in reverse already:
https://github.com/libguestfs/libguestfs/blob/master/v2v/rhv-upload-plugin.py

The problem is that HTTP sucks as a block device transport protocol.

We have over time ended up adding an ad-hoc, informally-specified
etc.. implementation of the NBD protocol, except over HTTPS.  For
example we've reimplemented zeroing and trimming as special
non-standard HTTP methods.

PATCH is not well known but it is a standard method. Our use is even close
to what the RFC describes:
https://tools.ietf.org/html/rfc5789
 
Then we discovered that performance still
sucked

It sucked because we use python builtin wsgiref server, which did not support
keep alive connections. So we were doing HTTPS handshake per request.
Fixed this in 1.4.2.
 
so we implemented a (non-standard) HTTP-over-Unix-domain-socket
transport, which of course nothing is able to connect to so we had to
modify the Python HTTP client to use it. 

This is fairly common (e.g. docker), or for running python http server behind
another server (e.g ngix).
 
Performance still isn't
great so the next step is to enable pipelining. 

I think performance is pretty good now, after I added efficient zeroing:
https://gerrit.ovirt.org/c/92901/
This should be available in 4.2.6.

Here initial results uploading 6G fedora-27 image created using virt-builder
to fast FC storage:

qemu-img convert with unordered writes (-W): 1.8 seconds
imageio example upload[1]: 3.2 seconds
virt-v2v import: 9.6 seconds
qemu-img convert: 10.6 seconds

I also tested uploading to raw disk using qemu-nbd on my dev setup with
LIO storage over 1G nic.
For reference, blkdiscard --zeroout takes 38.5 seconds on this setup.

imageio example upload to /dev/nbd0[1]: 40.3 seconds
qemu-img convert to /dev/nbd0: 61.5 seconds
qemu-img convert to nbd+unix socket with unordered writes (-W): 87.3 seconds
qemu-img convert to nbd+unix socket: 96.1 seconds

See more details bellow.

[1] imageio example upload script:
https://github.com/oVirt/ovirt-imageio/blob/master/examples/upload

Nir

## Tests with fast FC storage

# blkdiscard --zeroout /dev/e30bfac2-8e13-479d-8cd6-c6da5e306c4e/c9864222-bc52-4359-80d7-76e47d619b15
(install a ticket exposing the lv at https://xxxx:54322/images/fedora-27-01)
# time ./upload /var/tmp/fedora-27.img https://xxxx:54322/images/fedora-27-01

real 0m3.263s
user 0m0.262s
sys 0m0.414s

address@hidden upload]# blkdiscard --zeroout /dev/e30bfac2-8e13-479d-8cd6-c6da5e306c4e/c9864222-bc52-4359-80d7-76e47d619b15
(tweaked upload script to disable unix socket)
address@hidden upload]# time ./upload /var/tmp/fedora-27.img https://xxxx:54322/images/fedora-27-01

real 0m3.744s
user 0m0.938s
sys 0m0.710s

# blkdiscard --zeroout /dev/e30bfac2-8e13-479d-8cd6-c6da5e306c4e/c9864222-bc52-4359-80d7-76e47d619b15
# time qemu-img convert -p -f raw -O raw -t none /var/tmp/fedora-27.img /dev/e30bfac2-8e13-479d-8cd6-c6da5e306c4e/c9864222-bc52-4359-80d7-76e47d619b15
    (100.00/100%)

real 0m10.695s
user 0m0.514s
sys 0m1.459s

# blkdiscard --zeroout /dev/e30bfac2-8e13-479d-8cd6-c6da5e306c4e/c9864222-bc52-4359-80d7-76e47d619b15
# time qemu-img convert -p -f raw -O raw -t none -W /var/tmp/fedora-27.img /dev/e30bfac2-8e13-479d-8cd6-c6da5e306c4e/c9864222-bc52-4359-80d7-76e47d619b15
    (100.00/100%)

real 0m1.802s
user 0m0.284s
sys 0m1.119s

# virt-v2v \
    -i disk /var/tmp/fedora-27.img \
    -o rhv-upload \
    -oc https://yyyy/ovirt-engine/api \
    -os nsoffer-fc1 \
    -on v2v-$1 \
    -op /var/tmp/password \
    -of raw \
    -oa preallocated \
    -oo rhv-cafile=ca.pem \
    -oo rhv-cluster=nsoffer-fc-el7 \
    -oo rhv-direct=true

[   0.2] Opening the source -i disk /var/tmp/fedora-27.img
[   0.3] Creating an overlay to protect the source from being modified
[   0.6] Initializing the target -o rhv-upload -oa preallocated -oc https://yyyy/ovirt-engine/api -op /var/tmp/password -os nsoffer-fc1
[   1.9] Opening the overlay
[   3.2] Inspecting the overlay
[   7.0] Checking for sufficient free disk space in the guest
[   7.0] Estimating space required on target for each disk
[   7.0] Converting Fedora 27 (Twenty Seven) to run on KVM
virt-v2v: warning: /files/boot/grub2/device.map/hd0 references unknown 
device "vda".  You may have to fix this entry manually after conversion.
virt-v2v: This guest has virtio drivers installed.
[  28.4] Mapping filesystem data to avoid copying unused and blank areas
[  28.5] Closing the overlay
[  28.6] Checking if the guest needs BIOS or UEFI to boot
[  28.6] Assigning disks to buses
[  28.6] Copying disk 1/1 to qemu URI json:{ "file.driver": "nbd", "file.path": "/var/tmp/rhvupload.Uf7x99/nbdkit0.sock", "file.export": "/" } (raw)
    (100.00/100%)
[  49.4] Creating output metadata
[  74.2] Finishing off

The copy phase took 20.8 seconds, but it includes creating the disk and other
metadata operations, which are very inefficient in oVirt. Looking in imageio logs
we can see that the upload took only 9.6 seconds:

# grep OPTIONS /var/log/ovirt-imageio-daemon/daemon.log | head -1
2018-07-27 19:33:56,372 INFO    (Thread-23) [web] START [xxxx] OPTIONS /images/431bbd17-2b3a-40d1-a42e-fc43b1a76d48

# grep PATCH /var/log/ovirt-imageio-daemon/daemon.log | tail -1
2018-07-27 19:34:05,980 INFO    (Thread-24) [web] FINISH [local] PATCH /images/431bbd17-2b3a-40d1-a42e-fc43b1a76d48 [200] 0 [request=0.000682, operation=0.000166, flush=0.000115]


## Tests with LIO storage over 1g nic

# time blkdiscard --zeroout /dev/27837a03-64f9-4f2b-abb0-daa2195b01ae/f8dd86f5-cc91-48ea-b192-c4e6af826f8a

real 0m38.595s
user 0m0.001s
sys 0m0.002s

# qemu-nbd -c /dev/nbd0 -f raw -t -n --detect-zeroes=on --aio=native /dev/27837a03-64f9-4f2b-abb0-daa2195b01ae/f8dd86f5-cc91-48ea-b192-c4e6af826f8a
# chown vdsm:kvm /dev/nbd0

# blkdiscard --zeroout /dev/nbd0
(install a ticket exposing /dev/nbd0 at https://xxxx:54322/images/test)
# time ./upload /var/tmp/fedora-27.img https://xxxx:54322/images/test

real 0m40.347s
user 0m0.286s
sys 0m0.520s

# blkdiscard --zeroout /dev/nbd0
# time qemu-img convert -p -n -f raw -O raw -t none /var/tmp/fedora-27.img /dev/nbd0
    (100.00/100%)

real 1m1.514s
user 0m0.393s
sys 0m1.065s

# qemu-nbd -k /tmp/nbd.sock -f raw -t -n --detect-zeroes=on --aio=native /dev/27837a03-64f9-4f2b-abb0-daa2195b01ae/f8dd86f5-cc91-48ea-b192-c4e6af826f8a

# blkdiscard --zeroout /dev/27837a03-64f9-4f2b-abb0-daa2195b01ae/f8dd86f5-cc91-48ea-b192-c4e6af826f8a
# time qemu-img convert -p -n -f raw -O raw -t none /var/tmp/fedora-27.img nbd+unix://?socket=/tmp/nbd.sock
    (100.00/100%)

real 1m36.120s
user 0m0.176s
sys 0m0.640s
 
# blkdiscard --zeroout /dev/27837a03-64f9-4f2b-abb0-daa2195b01ae/f8dd86f5-cc91-48ea-b192-c4e6af826f8a
# time qemu-img convert -p -n -f raw -O raw -t none -W /var/tmp/fedora-27.img nbd+unix://?socket=/tmp/nbd.sock
    (100.00/100%)

real 1m27.341s
user 0m0.157s
sys 0m0.635s

reply via email to

[Prev in Thread] Current Thread [Next in Thread]