[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v3 10/18] vmstate: Use new JSON output visitor

From: Dr. David Alan Gilbert
Subject: Re: [Qemu-devel] [PATCH v3 10/18] vmstate: Use new JSON output visitor
Date: Wed, 4 May 2016 09:54:41 +0100
User-agent: Mutt/1.6.0 (2016-04-01)

* Markus Armbruster (address@hidden) wrote:
> "Dr. David Alan Gilbert" <address@hidden> writes:
> > * Eric Blake (address@hidden) wrote:
> >> On 05/03/2016 06:26 AM, Markus Armbruster wrote:
> >> 
> >> >>> +        visit_type_int(vmdesc, "size", &size, &error_abort);
> >> >>> +        visit_start_list(vmdesc, "fields", NULL, 0, &error_abort);
> >> >>> +        visit_start_struct(vmdesc, NULL, NULL, 0, &error_abort);
> >> >>
> >> >> Please avoid error_abort in migration code, especially on the source 
> >> >> side.
> >> >> You've got an apparently happily working VM, we must never kill it 
> >> >> while attempting migration.
> >> > 
> >> > These functions cannot fail, and &error_abort is a concise way to
> >> > express that.  It's the same as
> >> > 
> >> >             visit_type_int(vmdesc, "size", &size, &err);
> >> >             assert(!err);
> >> 
> >> &error_abort is ONLY supposed to be used to flag programming errors (ie.
> >> they should never be reachable).  I'm asserting that the errors don't
> >> happen, and therefore this cannot make the migration fail - in other
> >> words, this is NOT going to kill a VM that attempts migration.
> >
> > OK, but remember that I work on the basis that there are programming errors
> > in both the migration code and the VMState descriptions for devices.
> > If those break it still shouldn't kill the source.
> > (Note this isn't just true of migration - we need to be careful about
> > it in all cases where we're doing stuff to an otherwise happy VM).
> While you can safely recover from certain programming errors, you can't
> do it in general.  Worse, deciding whether recovery from a certain
> programming error is safe can be intractable.
> Example: visit_type_enum(v, name, &enum_val, enum_str, &err), where v is
> an output visitor.  This can fail when enum_val is not a valid subscript
> of enum_str[].  Can we recover safely?  Assume that we can cleanly fail
> the task at hand at this point of its execution.
> Perhaps enum_str[] doesn't match the actual enum.  This is a programming
> error.  Failing the task is graceful degradation, and safe enough.
> But what if enum_str[] is fine, but enum_val got corrupted?  Then
> failing the task is still safe as long as enum_val isn't visible outside
> the task.  But if it is visible, all bets are off.  The corruption can
> spread, and do real damage.  Can be the difference between a crash that
> forces a reboot with a filesystem journal replay, and massive data
> corruption.
> So, should we try to recover here?  Assuming we want to, badly.  If
> analysis shows the possible causes of this error are safely isolated by
> the recovery, yes.  Without such analysis, the only prudent answer is
> no.
> Real world examples typically deal with state more complex than just an
> enum (all too often a thicket of pointers), and the safety argument gets
> much hairier.
> If you want more tractable arguments, try Erlang.

And so my argument here is very simple; if we believe we have a corruption
in migration data then we fail migration - I don't try and do anything
clever about trying to bound what's broken.
This isn't about getting formal/tractable arguments, it's about making
a practical system.

> >> > * Conditions where the JSON output visitor itself sets an error:
> >> > 
> >> >   - None.
> >> 
> >> The JSON output visitor itself may be adding an error for an attempt to
> >> output Inf or NaN for a floating point number - but since vmstate
> >> doesn't use visit_type_number(), this is not possible.  And if we are
> >> really worried about it, then in my next spin of the patch I may make it
> >> user-configurable whether we stick to strict JSON or whether we relax
> >> things and output Inf/NaN anyways.
> >
> > If that's the only case, and you're already saying it doesn't use it, then
> > I don't see there's a point in making that bit any more configurable.
> I listed all possible failures of the JSON output visitor upthread.
> This is an additional failure we've considered.  I'm wary of adding it
> precisely because I do worry about upsetting apple carts like this one.

Dr. David Alan Gilbert / address@hidden / Manchester, UK

reply via email to

[Prev in Thread] Current Thread [Next in Thread]