qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] [PATCH 3/3] linux-aio: process completions from ioq_submit(


From: Roman Pen
Subject: [Qemu-devel] [PATCH 3/3] linux-aio: process completions from ioq_submit()
Date: Tue, 19 Jul 2016 12:25:21 +0200

In order to reduce completion latency it makes sense to harvest completed
requests ASAP.  Very fast backend device can complete requests just after
submission, so it is worth trying to check ring buffer in order to peek
completed requests directly after io_submit() has been called.

Indeed, this patch reduces the completions latencies and increases the
overall throughput, e.g. the following is the percentiles of number of
completed requests at once:

        1th 10th  20th  30th  40th  50th  60th  70th  80th  90th  99.99th
Before    2    4    42   112   128   128   128   128   128   128    128
 After    1    1     4    14    33    45    47    48    50    51    108

That means, that before the current patch is applied the ring buffer is
observed as full (128 requests were consumed at once) in 60% of calls.

After patch is applied the distribution of number of completed requests
is "smoother" and the queue (requests in-flight) is almost never full.

The fio read results are the following (write results are almost the
same and are not showed here):

  Before
  ------
job: (groupid=0, jobs=8): err= 0: pid=2227: Tue Jul 19 11:29:50 2016
  Description  : [Emulation of Storage Server Access Pattern]
  read : io=54681MB, bw=1822.7MB/s, iops=179779, runt= 30001msec
    slat (usec): min=172, max=16883, avg=338.35, stdev=109.66
    clat (usec): min=1, max=21977, avg=1051.45, stdev=299.29
     lat (usec): min=317, max=22521, avg=1389.83, stdev=300.73
    clat percentiles (usec):
     |  1.00th=[  346],  5.00th=[  596], 10.00th=[  708], 20.00th=[  852],
     | 30.00th=[  932], 40.00th=[  996], 50.00th=[ 1048], 60.00th=[ 1112],
     | 70.00th=[ 1176], 80.00th=[ 1256], 90.00th=[ 1384], 95.00th=[ 1496],
     | 99.00th=[ 1800], 99.50th=[ 1928], 99.90th=[ 2320], 99.95th=[ 2672],
     | 99.99th=[ 4704]
    bw (KB  /s): min=205229, max=553181, per=12.50%, avg=233278.26, 
stdev=18383.51

  After
  ------
job: (groupid=0, jobs=8): err= 0: pid=2220: Tue Jul 19 11:31:51 2016
  Description  : [Emulation of Storage Server Access Pattern]
  read : io=57637MB, bw=1921.2MB/s, iops=189529, runt= 30002msec
    slat (usec): min=169, max=20636, avg=329.61, stdev=124.18
    clat (usec): min=2, max=19592, avg=988.78, stdev=251.04
     lat (usec): min=381, max=21067, avg=1318.42, stdev=243.58
    clat percentiles (usec):
     |  1.00th=[  310],  5.00th=[  580], 10.00th=[  748], 20.00th=[  876],
     | 30.00th=[  908], 40.00th=[  948], 50.00th=[ 1012], 60.00th=[ 1064],
     | 70.00th=[ 1080], 80.00th=[ 1128], 90.00th=[ 1224], 95.00th=[ 1288],
     | 99.00th=[ 1496], 99.50th=[ 1608], 99.90th=[ 1960], 99.95th=[ 2256],
     | 99.99th=[ 5408]
    bw (KB  /s): min=212149, max=390160, per=12.49%, avg=245746.04, 
stdev=11606.75

Throughput increased from 1822MB/s to 1921MB/s, average completion latencies
decreased from 1051us to 988us.

Signed-off-by: Roman Pen <address@hidden>
Cc: Stefan Hajnoczi <address@hidden>
Cc: Paolo Bonzini <address@hidden>
Cc: address@hidden
---
 block/linux-aio.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/block/linux-aio.c b/block/linux-aio.c
index fae5c82..a02e692 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -34,6 +34,7 @@ struct qemu_laiocb {
     LinuxAioState *ctx;
     struct iocb iocb;
     ssize_t ret;
+    bool self_completed;
     size_t nbytes;
     QEMUIOVector *qiov;
     bool is_read;
@@ -92,7 +93,11 @@ static void qemu_laio_process_completion(struct qemu_laiocb 
*laiocb)
 
     laiocb->ret = ret;
     if (laiocb->co) {
-        qemu_coroutine_enter(laiocb->co, NULL);
+        if (laiocb->co == qemu_coroutine_self()) {
+            laiocb->self_completed = true;
+        } else {
+            qemu_coroutine_enter(laiocb->co, NULL);
+        }
     } else {
         laiocb->common.cb(laiocb->common.opaque, ret);
         qemu_aio_unref(laiocb);
@@ -312,6 +317,12 @@ static void ioq_submit(LinuxAioState *s)
         QSIMPLEQ_SPLIT_AFTER(&s->io_q.pending, aiocb, next, &completed);
     } while (ret == len && !QSIMPLEQ_EMPTY(&s->io_q.pending));
     s->io_q.blocked = (s->io_q.in_queue > 0);
+
+    if (s->io_q.in_flight) {
+        /* We can try to complete something just right away if there are
+         * still requests in-flight. */
+        qemu_laio_process_completions(s);
+    }
 }
 
 void laio_io_plug(BlockDriverState *bs, LinuxAioState *s)
@@ -379,7 +390,9 @@ int coroutine_fn laio_co_submit(BlockDriverState *bs, 
LinuxAioState *s, int fd,
         return ret;
     }
 
-    qemu_coroutine_yield();
+    if (!laiocb.self_completed) {
+        qemu_coroutine_yield();
+    }
     return laiocb.ret;
 }
 
-- 
2.8.2




reply via email to

[Prev in Thread] Current Thread [Next in Thread]