|
From: | Hailiang Zhang |
Subject: | Re: [Qemu-devel] [PATCH COLO-Frame v12 11/38] COLO: Add a new RunState RUN_STATE_COLO |
Date: | Tue, 12 Jan 2016 20:54:14 +0800 |
User-agent: | Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0 |
On 2016/1/11 21:16, Markus Armbruster wrote:
Hailiang Zhang <address@hidden> writes:On 2015/12/19 17:27, Markus Armbruster wrote:zhanghailiang <address@hidden> writes:Guest will enter this state when paused to save/restore VM state under colo checkpoint. Cc: Eric Blake <address@hidden> Cc: Markus Armbruster <address@hidden> Signed-off-by: zhanghailiang <address@hidden> Signed-off-by: Li Zhijian <address@hidden> Signed-off-by: Gonglei <address@hidden> Reviewed-by: Dr. David Alan Gilbert <address@hidden> Reviewed-by: Eric Blake <address@hidden> --- qapi-schema.json | 5 ++++- vl.c | 8 ++++++++ 2 files changed, 12 insertions(+), 1 deletion(-) diff --git a/qapi-schema.json b/qapi-schema.json index 85f7800..0423b47 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -154,12 +154,15 @@ # @watchdog: the watchdog action is configured to pause and has been triggered # # @guest-panicked: guest has been panicked as a result of guest OS panic +# +# @colo: guest is paused to save/restore VM state under colo checkpoint (since +# 2.6) ## { 'enum': 'RunState', 'data': [ 'debug', 'inmigrate', 'internal-error', 'io-error', 'paused', 'postmigrate', 'prelaunch', 'finish-migrate', 'restore-vm', 'running', 'save-vm', 'shutdown', 'suspended', 'watchdog', - 'guest-panicked' ] } + 'guest-panicked', 'colo' ] } ## # @StatusInfo: diff --git a/vl.c b/vl.c index f84fde8..fca630b 100644 --- a/vl.c +++ b/vl.c @@ -594,6 +594,7 @@ static const RunStateTransition runstate_transitions_def[] = { { RUN_STATE_INMIGRATE, RUN_STATE_WATCHDOG }, { RUN_STATE_INMIGRATE, RUN_STATE_GUEST_PANICKED }, { RUN_STATE_INMIGRATE, RUN_STATE_FINISH_MIGRATE }, + { RUN_STATE_INMIGRATE, RUN_STATE_COLO }, { RUN_STATE_INTERNAL_ERROR, RUN_STATE_PAUSED }, { RUN_STATE_INTERNAL_ERROR, RUN_STATE_FINISH_MIGRATE }, @@ -603,6 +604,7 @@ static const RunStateTransition runstate_transitions_def[] = { { RUN_STATE_PAUSED, RUN_STATE_RUNNING }, { RUN_STATE_PAUSED, RUN_STATE_FINISH_MIGRATE }, + { RUN_STATE_PAUSED, RUN_STATE_COLO}, { RUN_STATE_POSTMIGRATE, RUN_STATE_RUNNING }, { RUN_STATE_POSTMIGRATE, RUN_STATE_FINISH_MIGRATE }, @@ -613,9 +615,12 @@ static const RunStateTransition runstate_transitions_def[] = { { RUN_STATE_FINISH_MIGRATE, RUN_STATE_RUNNING }, { RUN_STATE_FINISH_MIGRATE, RUN_STATE_POSTMIGRATE }, + { RUN_STATE_FINISH_MIGRATE, RUN_STATE_COLO}, { RUN_STATE_RESTORE_VM, RUN_STATE_RUNNING }, + { RUN_STATE_COLO, RUN_STATE_RUNNING }, + { RUN_STATE_RUNNING, RUN_STATE_DEBUG }, { RUN_STATE_RUNNING, RUN_STATE_INTERNAL_ERROR }, { RUN_STATE_RUNNING, RUN_STATE_IO_ERROR }, @@ -626,6 +631,7 @@ static const RunStateTransition runstate_transitions_def[] = { { RUN_STATE_RUNNING, RUN_STATE_SHUTDOWN }, { RUN_STATE_RUNNING, RUN_STATE_WATCHDOG }, { RUN_STATE_RUNNING, RUN_STATE_GUEST_PANICKED }, + { RUN_STATE_RUNNING, RUN_STATE_COLO}, { RUN_STATE_SAVE_VM, RUN_STATE_RUNNING }, @@ -636,9 +642,11 @@ static const RunStateTransition runstate_transitions_def[] = { { RUN_STATE_RUNNING, RUN_STATE_SUSPENDED }, { RUN_STATE_SUSPENDED, RUN_STATE_RUNNING }, { RUN_STATE_SUSPENDED, RUN_STATE_FINISH_MIGRATE }, + { RUN_STATE_SUSPENDED, RUN_STATE_COLO}, { RUN_STATE_WATCHDOG, RUN_STATE_RUNNING }, { RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE }, + { RUN_STATE_WATCHDOG, RUN_STATE_COLO}, { RUN_STATE_GUEST_PANICKED, RUN_STATE_RUNNING }, { RUN_STATE_GUEST_PANICKED, RUN_STATE_FINISH_MIGRATE },Pardon my ignorance, but could you explain the new run state in a bit more detail for me?OK, in normally, we only need switch between COLO and RUNNING state. But we can't forbid users to issue other command while VM is COLO state. In every checkpoint, we have to pause to send VM's state to SVM, and before we pause VM, users may issue 'stop' command, which will change state to 'RUN_STATE_PAUSE', we don't want to abort VM because of this command. (Actually, we will support 'stop' VM during VM is in COLO state). So we need the state machine 'RUN_STATE_PAUSED -> RUN_STATE_COLO'.What's the next state then?
We may switch to RUN_STATE_RUNNING, actually, here, the RUN_STATE_COLO is only used to indicate that VM is stopped in COLO process.
We enter COLO state just after a full migration process which the last state will be 'RUN_STATE_FINISH_MIGRATE' or 'RUN_STATE_INMIGRATE', before we enter COLO loop, we may get 'x-colo-lost-heartbeat', and will run into 'RUN_STATE_COLO' pause, so we need state machines 'RUN_STATE_FINISH_MIGRATE -> RUN_STATE_COLO'and 'RUN_STATE_INMIGRATE, RUN_STATE_COLO'. The reason we need RUN_STATE_SUSPENDED -> RUN_STATE_COLO is, guest or users may issue standby command. We need to ensure VM not be crashed. Actually, we may need more states which can go to 'colo' state, maybe just follow the cases of 'MIGRATE' state.I believe we should fully work out the state transitions added by COLO. I like to write that down in this form: (state, trigger) -> (action, state')
I'm a little confused, for runstate_transitions_def, it seems that, the state transition is a simple way: (state1, state2). Here we only switch to RUN_STATE_COLO state when we need to do something with VM is paused.
Example: (running, checkpoint) -> (begin-checkpointing, colo)
Do you want me to add these new states into runstate_transitions_def ? What's the real status (running or stopping) of 'checkpoint' and 'colo' for VM here ?
with a suitable explanation of 'checkpoint' and 'begin-checkpointing'. For brevity, multiple (state1, trigger) -> (action, state') (state2, trigger) -> (action, state') ... (stateN, trigger) -> (action, state') can be abbreviated to ({state1, state2, stateN}, trigger) -> (action, state') Example: ({running, paused, ...}, checkpoint) -> (begin-checkpointing, colo) For clarity, chains of state transitions should be described in the order they happen. Pictures showing the states connected with transition arrows labelled with the trigger can help. Two properties to check: 1. Correctness: every state transition thus written down does the right thing. 2. Completeness: for every pair (state, trigger), we got a state transition, or an explanation why it cannot happen.Thanks, zhanghailiangYour additions to runstate_transitions_def[] show we can go *from* state 'colo' only to state 'running', but we can go *to* state 'colo' from various other states. This may well be sane, but it's not *obviously* sane :).
[Prev in Thread] | Current Thread | [Next in Thread] |