Re: [Qemu-devel] [Qemu-block] [RFC PATCH V4] qemu-img: make convert asyn

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [Qemu-block] [RFC PATCH V4] qemu-img: make convert asyn

From:	Peter Lieven
Subject:	Re: [Qemu-devel] [Qemu-block] [RFC PATCH V4] qemu-img: make convert async
Date:	Mon, 20 Feb 2017 15:59:42 +0100
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1

Am 20.02.2017 um 15:50 schrieb Stefan Hajnoczi:

On Fri, Feb 17, 2017 at 05:00:24PM +0100, Peter Lieven wrote:

this is something I have been thinking about for almost 2 years now.
we heavily have the following two use cases when using qemu-img convert.

a) reading from NFS and writing to iSCSI for deploying templates
b) reading from iSCSI and writing to NFS for backups

In both processes we use libiscsi and libnfs so we have no kernel pagecache.
As qemu-img convert is implemented with sync operations that means we
read one buffer and then write it. No parallelism and each sync request
takes as long as it takes until it is completed.

This is version 4 of the approach using coroutine worker "threads".

So far I have the following runtimes when reading an uncompressed QCOW2 from
NFS and writing it to iSCSI (raw):

qemu-img (master)
  nfs -> iscsi 22.8 secs
  nfs -> ram   11.7 secs
  ram -> iscsi 12.3 secs

qemu-img-async (8 coroutines, in-order write disabled)
  nfs -> iscsi 11.0 secs
  nfs -> ram   10.4 secs
  ram -> iscsi  9.0 secs

The following are the runtimes found with different settings between V3 and V4.
This is always the best runtime out of 10 runs when converting from nfs to 
iscsi.
Please note that in V4 in-order write scenarios show a very high jitter. I think
this is because the get_block_status on the NFS share is delayed by concurrent 
read
requests.

                        in-order        out-of-order
V3  - 16 coroutines    12.4 seconds    11.1 seconds
     -  8 coroutines    12.2 seconds    11.3 seconds
     -  4 coroutines    12.5 seconds    11.1 seconds
     -  2 coroutines    14.8 seconds    14.9 seconds

V4  - 32 coroutines    15.9 seconds    11.5 seconds
     - 16 coroutines    12.5 seconds    11.0 seconds
     -  8 coroutines    12.9 seconds    11.0 seconds
     -  4 coroutines    14.1 seconds    11.5 seconds
     -  2 coroutines    16.9 seconds    13.2 seconds

Does this patch work with compressed images?  Especially the
out-of-order write mode may be problematic with a compressed qcow2 image.


It does, but you are right, out-of-order writes and compression should
be mutually exclusive.


How should a user decide between in-order and out-of-order?


In general, out of order is ok, when we write to a preallocated device (e.g. 
all raw
devices or qcow2 with preallocation=full). default is in-order write. the user 
has
to manually enable out of order.

@@ -1651,12 +1680,117 @@ static int convert_write(ImgConvertState *s, int64_t 
sector_num, int nb_sectors,
      return 0;
  }

-static int convert_do_copy(ImgConvertState *s)

+static void convert_co_do_copy(void *opaque)

Missing coroutine_fn here and for convert_co_read()/convert_co_write().
Functions that must be called from coroutine context (because they
yield, use coroutine mutexes, etc) need to be marked as such.


okay.

+        if (s->wr_in_order) {
+            /* reenter the coroutine that might have waited
+             * for this write to complete */
+            s->wr_offs = sector_num + n;
+            for (i = 0; i < s->num_coroutines; i++) {
+                if (s->co[i] && s->wait_sector_num[i] == s->wr_offs) {
+                    qemu_coroutine_enter(s->co[i]);
+                    break;

This qemu_coroutine_enter() call relies on the yield pattern between
sibling coroutines having no recursive qemu_coroutine_enter() calls.
QEMU aborts if there is a code path where coroutine A enters B and then
B enters A again before yielding.

Paolo's new aio_co_wake() API solves this issue by deferring the
qemu_coroutine_enter() to the event loop.  It's similar to CoQueue
wakeup.  aio_co_wake() is part of my latest block pull request (should
be merged into qemu.git soon).

I *think* this patch has no A -> B -> A situation thanks to yields in
the code path, but it would be nicer to use aio_co_wake() where this can
never happen.


I also believe this can't happen. Would it be also okay to use
qemu_coroutine_enter_if_inactive() here?

+        case 'm':
+            num_coroutines = atoi(optarg);
+            if (num_coroutines > MAX_COROUTINES) {
+                error_report("Maximum allowed number of coroutines is %d",
+                             MAX_COROUTINES);
+                ret = -1;
+                goto fail_getopt;
+            }

Missing input validation for the < 1 case.


Right.

Thank you,
Peter

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] [RFC PATCH V4] qemu-img: make convert async, Peter Lieven, 2017/02/17
- Re: [Qemu-devel] [Qemu-block] [RFC PATCH V4] qemu-img: make convert async, Stefan Hajnoczi, 2017/02/20
  - Re: [Qemu-devel] [Qemu-block] [RFC PATCH V4] qemu-img: make convert async, Peter Lieven <=
    - Re: [Qemu-devel] [Qemu-block] [RFC PATCH V4] qemu-img: make convert async, Stefan Hajnoczi, 2017/02/21

Prev by Date: Re: [Qemu-devel] [PATCH v8 0/8] Add support for VM Generation ID
Next by Date: Re: [Qemu-devel] [PATCH v5] linux-user: Add sockopts for IPv6 ping and IPv6 traceroute
Previous by thread: Re: [Qemu-devel] [Qemu-block] [RFC PATCH V4] qemu-img: make convert async
Next by thread: Re: [Qemu-devel] [Qemu-block] [RFC PATCH V4] qemu-img: make convert async
Index(es):
- Date
- Thread