qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [Qemu-block] [PATCH COLO-Frame v12 25/38] qmp event: Ad


From: Hailiang Zhang
Subject: Re: [Qemu-devel] [Qemu-block] [PATCH COLO-Frame v12 25/38] qmp event: Add event notification for COLO error
Date: Wed, 23 Dec 2015 11:14:01 +0800
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0

On 2015/12/22 5:14, John Snow wrote:


On 12/19/2015 05:02 AM, Markus Armbruster wrote:
Copying qemu-block because this seems related to generalising block jobs
to background jobs.

zhanghailiang <address@hidden> writes:

If some errors happen during VM's COLO FT stage, it's important to notify the 
users
of this event. Together with 'colo_lost_heartbeat', users can intervene in 
COLO's
failover work immediately.
If users don't want to get involved in COLO's failover verdict,
it is still necessary to notify users that we exited COLO mode.

Cc: Markus Armbruster <address@hidden>
Cc: Michael Roth <address@hidden>
Signed-off-by: zhanghailiang <address@hidden>
Signed-off-by: Li Zhijian <address@hidden>
---
v11:
- Fix several typos found by Eric

Signed-off-by: zhanghailiang <address@hidden>
---
  docs/qmp-events.txt | 17 +++++++++++++++++
  migration/colo.c    | 11 +++++++++++
  qapi-schema.json    | 16 ++++++++++++++++
  qapi/event.json     | 17 +++++++++++++++++
  4 files changed, 61 insertions(+)

diff --git a/docs/qmp-events.txt b/docs/qmp-events.txt
index d2f1ce4..19f68fc 100644
--- a/docs/qmp-events.txt
+++ b/docs/qmp-events.txt
@@ -184,6 +184,23 @@ Example:
  Note: The "ready to complete" status is always reset by a BLOCK_JOB_ERROR
  event.

+COLO_EXIT
+---------
+
+Emitted when VM finishes COLO mode due to some errors happening or
+at the request of users.

How would the event's recipient distinguish between "due to error" and
"at the user's request"?

+
+Data:
+
+ - "mode": COLO mode, primary or secondary side (json-string)
+ - "reason":  the exit reason, internal error or external request. 
(json-string)
+ - "error": error message (json-string, operation)
+
+Example:
+
+{"timestamp": {"seconds": 2032141960, "microseconds": 417172},
+ "event": "COLO_EXIT", "data": {"mode": "primary", "reason": "request" } }
+

Pardon my ignorance again...  Does "VM finishes COLO mode" means have
some kind of COLO background job, and it just finished for whatever
reason?

If yes, this COLO job could be an instance of the general background job
concept we're trying to grow from the existing block job concept.

I'm not asking you to rebase your work onto the background job
infrastructure, not least for the simple reason that it doesn't exist,
yet.  But I think it would be fruitful to compare your COLO job
management QMP interface with the one we have for block jobs.  Not only
may that avoid unnecessary inconsistency, it could also help shape the
general background job interface.


Yes. The "background job" concept doesn't exist in a formal way outside
of the block layer yet, but we're looking to expand it as we re-tool the
block jobs themselves.

It may be the case that the COLO commands and events need to go in as
they are now, but later we can bring them back into the generalized job
infrastructure.


Agreed. ;)

Quick overview of the block job QMP interface:

* Commands to create a job: block-commit, block-stream, drive-mirror,
   drive-backup.

* Get information on jobs: query-block-jobs

* Pause a job: block-job-pause

* Resume a job: block-job-resume

* Cancel a job: block-job-cancel

* Block job completion events: BLOCK_JOB_COMPLETED, BLOCK_JOB_CANCELLED

* Block job error event: BLOCK_JOB_ERROR

* Block job synchronous completion: event BLOCK_JOB_READY and command
   block-job-complete


The block-agnostic version of these commands would likely be:

query-jobs
job-pause
job-resume
job-cancel
job-complete

Events: JOB_COMPLETED, JOB_CANCELLED, JOB_ERROR, JOB_READY.


It looks like COLO_EXIT would be an instance of JOB_COMPLETED, and if it
occurred due to an error, we'd also see JOB_ERROR emitted.


Yes, if we use this job frame for COLO, the COLO_EXIT will be like that.

  DEVICE_DELETED
  --------------

diff --git a/migration/colo.c b/migration/colo.c
index d1dd4e1..d06c14f 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -18,6 +18,7 @@
  #include "qemu/error-report.h"
  #include "qemu/sockets.h"
  #include "migration/failover.h"
+#include "qapi-event.h"

  /* colo buffer */
  #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
@@ -349,6 +350,11 @@ static void colo_process_checkpoint(MigrationState *s)
  out:
      if (ret < 0) {
          error_report("%s: %s", __func__, strerror(-ret));
+        qapi_event_send_colo_exit(COLO_MODE_PRIMARY, COLO_EXIT_REASON_ERROR,
+                                  true, strerror(-ret), NULL);
+    } else {
+        qapi_event_send_colo_exit(COLO_MODE_PRIMARY, COLO_EXIT_REASON_REQUEST,
+                                  false, NULL, NULL);
      }

      qsb_free(buffer);
@@ -516,6 +522,11 @@ out:
      if (ret < 0) {
          error_report("colo incoming thread will exit, detect error: %s",
                       strerror(-ret));
+        qapi_event_send_colo_exit(COLO_MODE_SECONDARY, COLO_EXIT_REASON_ERROR,
+                                  true, strerror(-ret), NULL);
+    } else {
+        qapi_event_send_colo_exit(COLO_MODE_SECONDARY, 
COLO_EXIT_REASON_REQUEST,
+                                  false, NULL, NULL);
      }

      if (fb) {
diff --git a/qapi-schema.json b/qapi-schema.json
index feb7d53..f6ecb88 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -778,6 +778,22 @@
    'data': [ 'unknown', 'primary', 'secondary'] }

  ##
+# @COLOExitReason
+#
+# The reason for a COLO exit
+#
+# @unknown: unknown reason

How can @unknown happen?

+#
+# @request: COLO exit is due to an external request
+#
+# @error: COLO exit is due to an internal error
+#
+# Since: 2.6
+##
+{ 'enum': 'COLOExitReason',
+  'data': [ 'unknown', 'request', 'error'] }
+
+##
  # @x-colo-lost-heartbeat
  #
  # Tell qemu that heartbeat is lost, request it to do takeover procedures.
diff --git a/qapi/event.json b/qapi/event.json
index f0cef01..f63d456 100644
--- a/qapi/event.json
+++ b/qapi/event.json
@@ -255,6 +255,23 @@
    'data': {'status': 'MigrationStatus'}}

  ##
+# @COLO_EXIT
+#
+# Emitted when VM finishes COLO mode due to some errors happening or
+# at the request of users.
+#
+# @mode: which COLO mode the VM was in when it exited.

Can we get 'unknown' here?

+#
+# @reason: describes the reason for the COLO exit.

Can we get 'unknown' here?

+#
+# @error: #optional, error message. Only present on error happening.
+#
+# Since: 2.6
+##
+{ 'event': 'COLO_EXIT',
+  'data': {'mode': 'COLOMode', 'reason': 'COLOExitReason', '*error': 'str' } }
+
+##
  # @ACPI_DEVICE_OST
  #
  # Emitted when guest executes ACPI _OST method.







reply via email to

[Prev in Thread] Current Thread [Next in Thread]