[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 4/4] virtio-blk: introduce multiread
From: |
Kevin Wolf |
Subject: |
Re: [Qemu-devel] [PATCH 4/4] virtio-blk: introduce multiread |
Date: |
Mon, 15 Dec 2014 17:00:51 +0100 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
Am 15.12.2014 um 16:52 hat Peter Lieven geschrieben:
> On 15.12.2014 16:43, Peter Lieven wrote:
> >On 15.12.2014 16:01, Kevin Wolf wrote:
> >>Am 09.12.2014 um 17:26 hat Peter Lieven geschrieben:
> >>>this patch finally introduces multiread support to virtio-blk. While
> >>>multiwrite support was there for a long time, read support was missing.
> >>>
> >>>To achieve this the patch does several things which might need further
> >>>explanation:
> >>>
> >>> - the whole merge and multireq logic is moved from block.c into
> >>> virtio-blk. This is move is a preparation for directly creating a
> >>> coroutine out of virtio-blk.
> >>>
> >>> - requests are only merged if they are strictly sequential, and no
> >>> longer sorted. This simplification decreases overhead and reduces
> >>> latency. It will also merge some requests which were unmergable before.
> >>>
> >>> The old algorithm took up to 32 requests, sorted them and tried to
> >>> merge
> >>> them. The outcome was anything between 1 and 32 requests. In case of
> >>> 32 requests there were 31 requests unnecessarily delayed.
> >>>
> >>> On the other hand let's imagine e.g. 16 unmergeable requests followed
> >>> by 32 mergable requests. The latter 32 requests would have been split
> >>> into two 16 byte requests.
> >>>
> >>> Last the simplified logic allows for a fast path if we have only a
> >>> single request in the multirequest. In this case the request is sent as
> >>> ordinary request without multireq callbacks.
> >>>
> >>>As a first benchmark I installed Ubuntu 14.04.1 on a local SSD. The number
> >>>of
> >>>merged requests is in the same order while the write latency is obviously
> >>>decreased by several percent.
> >>>
> >>>cmdline:
> >>>qemu-system-x86_64 -m 1024 -smp 2 -enable-kvm -cdrom
> >>>ubuntu-14.04.1-server-amd64.iso \
> >>> -drive if=virtio,file=/dev/ssd/ubuntu1404,aio=native,cache=none -monitor
> >>> stdio
> >>>
> >>>Before:
> >>>virtio0:
> >>> rd_bytes=151056896 wr_bytes=2683947008 rd_operations=18614
> >>> wr_operations=67979
> >>> flush_operations=15335 wr_total_time_ns=540428034217
> >>> rd_total_time_ns=11110520068
> >>> flush_total_time_ns=40673685006 rd_merged=0 wr_merged=15531
> >>>
> >>>After:
> >>>virtio0:
> >>> rd_bytes=149487104 wr_bytes=2701344768 rd_operations=18148
> >>> wr_operations=68578
> >>> flush_operations=15368 wr_total_time_ns=437030089565
> >>> rd_total_time_ns=9836288815
> >>> flush_total_time_ns=40597981121 rd_merged=690 wr_merged=14615
> >>>
> >>>Some first numbers of improved read performance while booting:
> >>>
> >>>The Ubuntu 14.04.1 vServer from above:
> >>>virtio0:
> >>> rd_bytes=97545216 wr_bytes=119808 rd_operations=5071 wr_operations=26
> >>> flush_operations=2 wr_total_time_ns=8847669 rd_total_time_ns=13952575478
> >>> flush_total_time_ns=3075496 rd_merged=742 wr_merged=0
> >>>
> >>>Windows 2012R2 (booted from iSCSI):
> >>>virtio0: rd_bytes=176559104 wr_bytes=61859840 rd_operations=7200
> >>>wr_operations=360
> >>> flush_operations=68 wr_total_time_ns=34344992718
> >>> rd_total_time_ns=134386844669
> >>> flush_total_time_ns=18115517 rd_merged=641 wr_merged=216
> >>>
> >>>Signed-off-by: Peter Lieven <address@hidden>
> >>Looks pretty good. The only thing I'm still unsure about are possible
> >>integer overflows in the merging logic. Maybe you can have another look
> >>there (ideally not only the places I commented on below, but the whole
> >>function).
> >>
> >>>@@ -414,14 +402,81 @@ void virtio_blk_handle_request(VirtIOBlockReq *req,
> >>>MultiReqBuffer *mrb)
> >>> iov_from_buf(in_iov, in_num, 0, serial, size);
> >>> virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);
> >>> virtio_blk_free_request(req);
> >>>- } else if (type & VIRTIO_BLK_T_OUT) {
> >>>- qemu_iovec_init_external(&req->qiov, iov, out_num);
> >>>- virtio_blk_handle_write(req, mrb);
> >>>- } else if (type == VIRTIO_BLK_T_IN || type == VIRTIO_BLK_T_BARRIER) {
> >>>- /* VIRTIO_BLK_T_IN is 0, so we can't just & it. */
> >>>- qemu_iovec_init_external(&req->qiov, in_iov, in_num);
> >>>- virtio_blk_handle_read(req);
> >>>- } else {
> >>>+ break;
> >>>+ }
> >>>+ case VIRTIO_BLK_T_IN:
> >>>+ case VIRTIO_BLK_T_OUT:
> >>>+ {
> >>>+ bool is_write = type & VIRTIO_BLK_T_OUT;
> >>>+ int64_t sector_num = virtio_ldq_p(VIRTIO_DEVICE(req->dev),
> >>>+ &req->out.sector);
> >>>+ int max_transfer_length =
> >>>blk_get_max_transfer_length(req->dev->blk);
> >>>+ int nb_sectors = 0;
> >>>+ bool merge = true;
> >>>+
> >>>+ if (!virtio_blk_sect_range_ok(req->dev, sector_num,
> >>>req->qiov.size)) {
> >>>+ virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR);
> >>>+ virtio_blk_free_request(req);
> >>>+ return;
> >>>+ }
> >>>+
> >>>+ if (is_write) {
> >>>+ qemu_iovec_init_external(&req->qiov, iov, out_num);
> >>>+ trace_virtio_blk_handle_write(req, sector_num,
> >>>+ req->qiov.size /
> >>>BDRV_SECTOR_SIZE);
> >>>+ } else {
> >>>+ qemu_iovec_init_external(&req->qiov, in_iov, in_num);
> >>>+ trace_virtio_blk_handle_read(req, sector_num,
> >>>+ req->qiov.size /
> >>>BDRV_SECTOR_SIZE);
> >>>+ }
> >>>+
> >>>+ nb_sectors = req->qiov.size / BDRV_SECTOR_SIZE;
> >>qiov.size is controlled by the guest, and nb_sectors is only an int. Are
> >>you sure that this can't overflow?
> >
> >In theory, yes. For this to happen in_iov or iov needs to contain
> >2TB of data on 32-bit systems. But theoretically there could
> >also be already an overflow in qemu_iovec_init_external where
> >multiple size_t are summed up in a size_t.
> >
> >There has been no overflow checking in the merge routine in
> >the past, but if you feel better, we could add sth like this:
> >
> >diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> >index cc0076a..e9236da 100644
> >--- a/hw/block/virtio-blk.c
> >+++ b/hw/block/virtio-blk.c
> >@@ -410,8 +410,8 @@ void virtio_blk_handle_request(VirtIOBlockReq *req,
> >MultiReqBuffer *mrb)
> > bool is_write = type & VIRTIO_BLK_T_OUT;
> > int64_t sector_num = virtio_ldq_p(VIRTIO_DEVICE(req->dev),
> >&req->out.sector);
> >- int max_transfer_length =
> >blk_get_max_transfer_length(req->dev->blk);
> >- int nb_sectors = 0;
> >+ int64_t max_transfer_length =
> >blk_get_max_transfer_length(req->dev->blk);
> >+ int64_t nb_sectors = 0;
> > bool merge = true;
> >
> > if (!virtio_blk_sect_range_ok(req->dev, sector_num,
> > req->qiov.size)) {
> >@@ -431,6 +431,7 @@ void virtio_blk_handle_request(VirtIOBlockReq *req,
> >MultiReqBuffer *mrb)
> > }
> >
> > nb_sectors = req->qiov.size / BDRV_SECTOR_SIZE;
> >+ max_transfer_length = MIN_NON_ZERO(max_transfer_length, INT_MAX);
> >
> > block_acct_start(blk_get_stats(req->dev->blk),
> > &req->acct, req->qiov.size,
> >@@ -443,8 +444,7 @@ void virtio_blk_handle_request(VirtIOBlockReq *req,
> >MultiReqBuffer *mrb)
> > }
> >
> > /* merge would exceed maximum transfer length of backend device */
> >- if (max_transfer_length &&
> >- mrb->nb_sectors + nb_sectors > max_transfer_length) {
> >+ if (nb_sectors + mrb->nb_sectors > max_transfer_length) {
> > merge = false;
> > }
> >
>
> May also this here:
>
> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> index cc0076a..fa647b6 100644
> --- a/hw/block/virtio-blk.c
> +++ b/hw/block/virtio-blk.c
> @@ -333,6 +333,9 @@ static bool virtio_blk_sect_range_ok(VirtIOBlock *dev,
> uint64_t nb_sectors = size >> BDRV_SECTOR_BITS;
> uint64_t total_sectors;
>
> + if (nb_sectors > INT_MAX) {
> + return false;
> + }
> if (sector & dev->sector_mask) {
> return false;
> }
>
>
> Thats something that has not been checked for ages as well.
Adding checks can never hurt, so go for it. ;-)
Kevin
[Qemu-devel] [PATCH 3/4] block-backend: expose bs->bl.max_transfer_length, Peter Lieven, 2014/12/09