[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 6/6] migration/colo.c: Move colo_notify_compares_event to the
From: |
Lukas Straub |
Subject: |
Re: [PATCH 6/6] migration/colo.c: Move colo_notify_compares_event to the right place |
Date: |
Thu, 14 May 2020 16:31:05 +0200 |
On Thu, 14 May 2020 13:27:30 +0000
Zhanghailiang <address@hidden> wrote:
> Cc: Zhang Chen <address@hidden>
>
> >
> > If the secondary has to failover during checkpointing, it still is in the
> > old state
> > (i.e. different state than primary). Thus we can't expose the primary state
> > until after the checkpoint is sent.
> >
>
> Hmm, do you mean we should not flush the net packages to client connection
> until checkpointing
> Process almost success because it may fail during checkpointing ?
No.
If the primary fails/crashes during checkpointing, the secondary is still in
different state than the primary because it didn't receive the full checkpoint.
We can release the miscompared packets only after both primary and secondary
are in the same state.
Example:
1. Client opens a TCP connection, sends SYN.
2. Primary accepts the connection with SYN-ACK, but due to nondeterministic
execution the secondary is delayed.
3. Checkpoint happens, primary releases the SYN-ACK packet but then crashes
while sending the checkpoint.
4. The Secondary fails over. At this point it is still in the old state where
it hasn't sent the SYN-ACK packet.
5. The client responds with ACK to the SYN-ACK packet.
6. Because it doesn't know the connection, the secondary responds with RST,
connection reset.
Regards,
Lukas Straub
> > This fixes sporadic connection reset of client connections during failover.
> >
> > Signed-off-by: Lukas Straub <address@hidden>
> > ---
> > migration/colo.c | 12 ++++++------
> > 1 file changed, 6 insertions(+), 6 deletions(-)
> >
> > diff --git a/migration/colo.c b/migration/colo.c index
> > a69782efc5..a3fc21e86e 100644
> > --- a/migration/colo.c
> > +++ b/migration/colo.c
> > @@ -430,12 +430,6 @@ static int
> > colo_do_checkpoint_transaction(MigrationState *s,
> > goto out;
> > }
> >
> > - qemu_event_reset(&s->colo_checkpoint_event);
> > - colo_notify_compares_event(NULL, COLO_EVENT_CHECKPOINT,
> > &local_err);
> > - if (local_err) {
> > - goto out;
> > - }
> > -
> > /* Disable block migration */
> > migrate_set_block_enabled(false, &local_err);
> > qemu_mutex_lock_iothread();
> > @@ -494,6 +488,12 @@ static int
> > colo_do_checkpoint_transaction(MigrationState *s,
> > goto out;
> > }
> >
> > + qemu_event_reset(&s->colo_checkpoint_event);
> > + colo_notify_compares_event(NULL, COLO_EVENT_CHECKPOINT,
> > &local_err);
> > + if (local_err) {
> > + goto out;
> > + }
> > +
> > colo_receive_check_message(s->rp_state.from_dst_file,
> > COLO_MESSAGE_VMSTATE_LOADED, &local_err);
> > if (local_err) {
> > --
> > 2.20.1
pgpOt7IMTEtkO.pgp
Description: OpenPGP digital signature
- [PATCH 2/6] migration/colo.c: Use cpu_synchronize_all_states(), (continued)
- [PATCH 2/6] migration/colo.c: Use cpu_synchronize_all_states(), Lukas Straub, 2020/05/11
- [PATCH 3/6] migration/colo.c: Flush ram cache only after receiving device state, Lukas Straub, 2020/05/11
- [PATCH 4/6] migration/colo.c: Relaunch failover even if there was an error, Lukas Straub, 2020/05/11
- [PATCH 5/6] migration/qemu-file.c: Don't ratelimit a shutdown fd, Lukas Straub, 2020/05/11
- [PATCH 6/6] migration/colo.c: Move colo_notify_compares_event to the right place, Lukas Straub, 2020/05/11
- RE: [PATCH 6/6] migration/colo.c: Move colo_notify_compares_event to the right place, Zhanghailiang, 2020/05/14