qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v3 09/27] migration: add reporting of errors for


From: Daniel P. Berrange
Subject: Re: [Qemu-devel] [PATCH v3 09/27] migration: add reporting of errors for outgoing migration
Date: Fri, 4 Mar 2016 10:49:46 +0000
User-agent: Mutt/1.5.24 (2015-08-30)

On Fri, Mar 04, 2016 at 10:49:35AM +0100, Markus Armbruster wrote:
> "Daniel P. Berrange" <address@hidden> writes:
> 
> > Currently if an app initiates an outgoing migration, it
> 
> application
> 
> > may or may not, get an error reported back on failure. If
> > the error occurs synchronously to the 'migrate' command
> > execution, the client app will see the error message. This
> > is the case for DNS lookup failures. If the error occurs
> > asynchronously to the monitor command though, the error
> > will be thrown away and the client left guessing about
> > what went wrong. This is the case for failure to connect
> > to the TCP server (eg due to wrong port, or firewall
> > rules, or other similar errors).
> >
> > In the future we'll be adding more scope for errors to
> > happen asynchronously with the TLS protocol handshake.
> > TLS errors are hard to diagnose even when they are well
> > reported, so discarding errors entirely will make it
> > impossible to debug TLS connection problems.
> >
> > Management apps which do migration are already using
> > 'query-migrate' / 'info migrate' to check up on progress
> > of background migration operations and to see their end
> > status. This is a fine place to also include the error
> > message when things go wrong.
> >
> > This patch thus adds an 'error-desc' field to the
> > MigrationInfo struct, which will be populated when
> > the 'status' is set to 'failed':
> >
> > (qemu) migrate -d tcp:localhost:9001
> > (qemu) info migrate
> > capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: 
> > off compress: off events: off x-postcopy-ram: off
> > Migration status: failed
> > total time: 0 milliseconds
> > error description: Error connecting to socket: Connection refused
> 
> Perhaps:
> 
>   Migration status: failed (Error connecting to socket: Connection refused)
> 
> Just an idea, use whatever you like better.

Yeah, that is nicer for the HMP.

> > In the HMP, when doing non-detached migration, it is
> > also possible to display this error message directly
> > to the app.
> >
> > (qemu) migrate tcp:localhost:9001
> > Error connecting to socket: Connection refused
> 
> You could include a QMP example if you like.

Sure, will add it.


> > @@ -853,12 +857,14 @@ static void migrate_fd_cleanup(void *opaque)
> >      notifier_list_notify(&migration_state_notifiers, s);
> >  }
> >  
> > -void migrate_fd_error(MigrationState *s)
> > +void migrate_fd_error(MigrationState *s, const Error *error)
> >  {
> > -    trace_migrate_fd_error();
> > +    trace_migrate_fd_error(error ? error_get_pretty(error) : "");
> >      assert(s->to_dst_file == NULL);
> >      migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
> >                        MIGRATION_STATUS_FAILED);
> > +    error_free(s->error);
> > +    s->error = error_copy(error);
> 
> Can migrate_fd_error() be called more than once per migration attempt?

I think so, but I felt it worth being paranoid against mistakes
so I chose to call error_free() just in case.

> > diff --git a/qapi-schema.json b/qapi-schema.json
> > index 7b8f2a1..ff89747 100644
> > --- a/qapi-schema.json
> > +++ b/qapi-schema.json
> > @@ -484,6 +484,8 @@
> >  #       throttled during auto-converge. This is only present when 
> > auto-converge
> >  #       has started throttling guest cpus. (Since 2.5)
> >  #
> > +# @error-desc: the error description string, when @status == 'failed' 
> > (Since 2.6)
> > +#
> 
> "when @status == 'failed'" is a semantic constraint, not visible in
> query-qmp-schema.  Several existing members have similar constraints.
> 
> Making this a flat union tagged by @status would turn semantic
> constraints into syntax, visible in query-qmp-schema.  Not sure it's
> worth the churn now.
> 
> Note that qmp_query_migrate() may not actually set @error-desc when
> @status is 'failed'.  Bug in either the code or the documentation.

Bug in docs :-)

> Further note that code uses it without also checking status.  You could
> assert the constraint holds.  Your choice.
> 
> Let me take a step back and examine the bigger picture.
> 
> Migration is a long-running task with a non-trivial live cycle (see enum
> MigrationStatus).  Similar tasks exist in the block layer ("block
> jobs"), and perhaps we can have a generic "jobs" abstraction some day.
> 
> Originally, QMP was designed to permit doing long-running tasks as
> asynchronous commands.  We ended up doing them as synchronous commands +
> events + status queries.  Two reasons, one accidental, one fundamental.
> 
> The accidental one is asynchronous commands never quite worked, so
> nobody used them, so nobody bothered to fix them.
> 
> The fundamental one is complex life cycles.  A job that starts, runs
> silently for a while, then either completes or fails can be done nicely
> as asynchronous command.  But when the job's life cycle is more complex,
> a single command reply isn't enough.  Thus commands + events + status
> queries.
> 
> In this model, the reply to a status query in a "failed" state takes the
> role of the asynchronous command's error reply.  It therefore makes
> sense to compare the two.
> 
> The QMP error reply is documented as follows in qmp-spec.txt:
> 
>     The format of an error response is:
> 
>     { "error": { "class": json-string, "desc": json-string }, "id": 
> json-value }
> 
>      Where,
> 
>     - The "class" member contains the error class name (eg. "GenericError")
>     - The "desc" member is a human-readable error message. Clients should
>       not attempt to parse this message.
>     - The "id" member contains the transaction identification associated with
>       the command execution if issued by the Client
> 
> MigrationInfo in a "failed" contains @status and @error-desc, and may
> contain additional members like @total-time (not entirely clear from
> documentation, which means the documentation is too vague).
> 
> MigrationInfo covers @desc, but not @class and @id.
> 
> Use of @class is discouraged nowadays, and omitting it here at least
> until we have a compelling use for it makes sense.
> 
> @id ties the asynchronous reply to the command.  Not necessary as long
> as only one migration task can exist.
> 
> The only thing I'd like you to add to MigrationInfo is the "Clients
> should not attempt to parse this" admonition.  Suggest to describe
> @error-desc as "human-readable" while there.

Yes, makes sense.


Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|



reply via email to

[Prev in Thread] Current Thread [Next in Thread]