qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 3/5] migration/rdma: Allow cancelling while wait


From: Peter Xu
Subject: Re: [Qemu-devel] [PATCH 3/5] migration/rdma: Allow cancelling while waiting for wrid
Date: Wed, 12 Jul 2017 17:32:34 +0800
User-agent: Mutt/1.5.24 (2015-08-30)

On Tue, Jul 04, 2017 at 07:49:13PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <address@hidden>
> 
> When waiting for a WRID, if the other side dies we end up waiting
> for ever with no way to cancel the migration.
> Cure this by poll()ing the fd first with a timeout and checking
> error flags and migration state.
> 
> Signed-off-by: Dr. David Alan Gilbert <address@hidden>
> ---
>  migration/rdma.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++++------
>  1 file changed, 48 insertions(+), 6 deletions(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index 6111e10c70..7273ae9929 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -1466,6 +1466,52 @@ static uint64_t qemu_rdma_poll(RDMAContext *rdma, 
> uint64_t *wr_id_out,
>      return  0;
>  }
>  
> +/* Wait for activity on the completion channel.
> + * Returns 0 on success, none-0 on error.
> + */
> +static int qemu_rdma_wait_comp_channel(RDMAContext *rdma)
> +{
> +    /*
> +     * Coroutine doesn't start until migration_fd_process_incoming()
> +     * so don't yield unless we know we're running inside of a coroutine.
> +     */
> +    if (rdma->migration_started_on_destination) {
> +        yield_until_fd_readable(rdma->comp_channel->fd);
> +    } else {
> +        /* This is the source side, we're in a separate thread
> +         * or destination prior to migration_fd_process_incoming()
> +         * we can't yield; so we have to poll the fd.
> +         * But we need to be able to handle 'cancel' or an error
> +         * without hanging forever.
> +         */
> +        while (!rdma->error_state && !rdma->error_reported &&
> +               !rdma->received_error) {

Should we just check rdma->error_state? Since iiuc error_reported only
means whether error is reported (set to 1 if reported), and
received_error means whether we got explicit error from peer (I think
only via a RDMA_CONTROL_ERROR message), and it's 1 if received. IIUC
either error_reported or received_error will mean that error_state be
set already.

> +            GPollFD pfds[1];
> +            pfds[0].fd = rdma->comp_channel->fd;
> +            pfds[0].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
> +            /* 0.5s timeout, should be fine for a 'cancel' */
> +            switch (qemu_poll_ns(pfds, 1, 500 * 1000 * 1000)) {

(I would be more aggresive, say, 100ms, or even less, for better UI
 response with merely nothing lost. But 500ms is okay as well to me)

> +            case 1: /* fd active */
> +                return 0;
> +
> +            case 0: /* Timeout, go around again */
> +                break;
> +
> +            default: /* Error of some type */
> +                return -1;
> +            }
> +
> +            if (migrate_get_current()->state == MIGRATION_STATUS_CANCELLING) 
> {
> +                /* Bail out and let the cancellation happen */
> +                return -EPIPE;
> +            }
> +        }
> +    }
> +
> +    return rdma->error_state || rdma->error_reported ||
> +           rdma->received_error;

Similar question here. Maybe just return rdma->error_state?

Thanks,

-- 
Peter Xu



reply via email to

[Prev in Thread] Current Thread [Next in Thread]