qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4] migration/rdma: Fix out of order wrid


From: Juan Quintela
Subject: Re: [PATCH v4] migration/rdma: Fix out of order wrid
Date: Fri, 29 Oct 2021 12:16:02 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)

Li Zhijian <lizhijian@cn.fujitsu.com> wrote:
> destination:
> ../qemu/build/qemu-system-x86_64 -enable-kvm -netdev 
> tap,id=hn0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown -device 
> e1000,netdev=hn0,mac=50:52:54:00:11:22 -boot c -drive 
> if=none,file=./Fedora-rdma-server-migration.qcow2,id=drive-virtio-disk0 
> -device 
> virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -m 
> 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -vga qxl 
> -spice streaming-video=filter,port=5902,disable-ticketing -incoming 
> rdma:192.168.22.23:8888
> qemu-system-x86_64: -spice 
> streaming-video=filter,port=5902,disable-ticketing: warning: short-form 
> boolean option 'disable-ticketing' deprecated
> Please use disable-ticketing=on instead
> QEMU 6.0.50 monitor - type 'help' for more information
> (qemu) trace-event qemu_rdma_block_for_wrid_miss on
> (qemu) dest_init RDMA Device opened: kernel name rxe_eth0 uverbs device name 
> uverbs2, infiniband_verbs class device path 
> /sys/class/infiniband_verbs/uverbs2, infiniband class device path 
> /sys/class/infiniband/rxe_eth0, transport: (2) Ethernet
> qemu_rdma_block_for_wrid_miss A Wanted wrid CONTROL SEND (2000) but got 
> CONTROL RECV (4000)
>
> source:
> ../qemu/build/qemu-system-x86_64 -enable-kvm -netdev 
> tap,id=hn0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown -device 
> e1000,netdev=hn0,mac=50:52:54:00:11:22 -boot c -drive 
> if=none,file=./Fedora-rdma-server.qcow2,id=drive-virtio-disk0 -device 
> virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -m 
> 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -vga qxl 
> -spice streaming-video=filter,port=5901,disable-ticketing -S
> qemu-system-x86_64: -spice 
> streaming-video=filter,port=5901,disable-ticketing: warning: short-form 
> boolean option 'disable-ticketing' deprecated
> Please use disable-ticketing=on instead
> QEMU 6.0.50 monitor - type 'help' for more information
> (qemu)
> (qemu) trace-event qemu_rdma_block_for_wrid_miss on
> (qemu) migrate -d rdma:192.168.22.23:8888
> source_resolve_host RDMA Device opened: kernel name rxe_eth0 uverbs device 
> name uverbs2, infiniband_verbs class device path 
> /sys/class/infiniband_verbs/uverbs2, infiniband class device path 
> /sys/class/infiniband/rxe_eth0, transport: (2) Ethernet
> (qemu) qemu_rdma_block_for_wrid_miss A Wanted wrid WRITE RDMA (1) but got 
> CONTROL RECV (4000)
>
> NOTE: we use soft RoCE as the rdma device.
> [root@iaas-rpma images]# rdma link show rxe_eth0/1
> link rxe_eth0/1 state ACTIVE physical_state LINK_UP netdev eth0
>
> This migration could not be completed when out of order(OOO) CQ event occurs.
> The send queue and receive queue shared a same completion queue, and
> qemu_rdma_block_for_wrid() will drop the CQs it's not interested in. But
> the dropped CQs by qemu_rdma_block_for_wrid() could be later CQs it wants.
> So in this case, qemu_rdma_block_for_wrid() will block forever.
>
> OOO cases will occur in both source side and destination side. And a
> forever blocking happens on only SEND and RECV are out of order. OOO between
> 'WRITE RDMA' and 'RECV' doesn't matter.
>
> below the OOO sequence:
>        source                             destination
>       rdma_write_one()                   qemu_rdma_registration_handle()
> 1.    S1: post_recv X                    D1: post_recv Y
> 2.    wait for recv CQ event X
> 3.                                       D2: post_send X     ---------------+
> 4.                                       wait for send CQ send event X (D2) |
> 5.    recv CQ event X reaches (D2)                                          |
> 6.  +-S2: post_send Y                                                       |
> 7.  | wait for send CQ event Y                                              |
> 8.  |                                    recv CQ event Y (S2) (drop it)     |
> 9.  +-send CQ event Y reaches (S2)                                          |
> 10.                                      send CQ event X reaches (D2)  -----+
> 11.                                      wait recv CQ event Y (dropped by (8))
>
> Although a hardware IB works fine in my a hundred of runs, the IB 
> specification
> doesn't guaratee the CQ order in such case.
>
> Here we introduce a independent send completion queue to distinguish
> ibv_post_send completion queue from the original mixed completion queue.
> It helps us to poll the specific CQE we are really interested in.
>
> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

Change is reasonable from migration point of view, and my RDMA knowledge
is not good enough to discern.

> @@ -3115,10 +3160,14 @@ static void 
> qio_channel_rdma_set_aio_fd_handler(QIOChannel *ioc,
>  {
>      QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
>      if (io_read) {
> -        aio_set_fd_handler(ctx, rioc->rdmain->comp_channel->fd,
> +        aio_set_fd_handler(ctx, rioc->rdmain->recv_comp_channel->fd,
> +                           false, io_read, io_write, NULL, opaque);
> +        aio_set_fd_handler(ctx, rioc->rdmain->send_comp_channel->fd,
>                             false, io_read, io_write, NULL, opaque);
>      } else {
> -        aio_set_fd_handler(ctx, rioc->rdmaout->comp_channel->fd,
> +        aio_set_fd_handler(ctx, rioc->rdmaout->recv_comp_channel->fd,
> +                           false, io_read, io_write, NULL, opaque);
> +        aio_set_fd_handler(ctx, rioc->rdmaout->send_comp_channel->fd,
>                             false, io_read, io_write, NULL, opaque);
>      }
>  }

Not related tothis patch.  But this function asks to be splited in two,
it is a single if depending of one of the parameters.

> @@ -3332,7 +3381,22 @@ static size_t qemu_rdma_save_page(QEMUFile *f, void 
> *opaque,
>       */
>      while (1) {
>          uint64_t wr_id, wr_id_in;
> -        int ret = qemu_rdma_poll(rdma, &wr_id_in, NULL);
> +        int ret = qemu_rdma_poll(rdma, rdma->recv_cq, &wr_id_in, NULL);
> +        if (ret < 0) {
> +            error_report("rdma migration: polling error! %d", ret);

To comment to what Dave said on the previosu review.  If you touch this
part again, you can also differentiate recv/send channel here?

> +            goto err;
> +        }
> +
> +        wr_id = wr_id_in & RDMA_WRID_TYPE_MASK;
> +
> +        if (wr_id == RDMA_WRID_NONE) {
> +            break;
> +        }

Code was already that way, but creating a variable for not putting:

        if ((wr_id_in & RDMA_WRID_TYPE_MASK) == RDMA_WRID_NONE) {
            break;
        }
I was just searching if wr_id was used anywhere else.

Later, Juan.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]