[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v3 10/18] vmstate: Use new JSON output visitor

From: Markus Armbruster
Subject: Re: [Qemu-devel] [PATCH v3 10/18] vmstate: Use new JSON output visitor
Date: Wed, 04 May 2016 10:39:10 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux)

"Dr. David Alan Gilbert" <address@hidden> writes:

> * Eric Blake (address@hidden) wrote:
>> On 05/03/2016 06:26 AM, Markus Armbruster wrote:
>> >>> +        visit_type_int(vmdesc, "size", &size, &error_abort);
>> >>> +        visit_start_list(vmdesc, "fields", NULL, 0, &error_abort);
>> >>> +        visit_start_struct(vmdesc, NULL, NULL, 0, &error_abort);
>> >>
>> >> Please avoid error_abort in migration code, especially on the source side.
>> >> You've got an apparently happily working VM, we must never kill it 
>> >> while attempting migration.
>> > 
>> > These functions cannot fail, and &error_abort is a concise way to
>> > express that.  It's the same as
>> > 
>> >             visit_type_int(vmdesc, "size", &size, &err);
>> >             assert(!err);
>> &error_abort is ONLY supposed to be used to flag programming errors (ie.
>> they should never be reachable).  I'm asserting that the errors don't
>> happen, and therefore this cannot make the migration fail - in other
>> words, this is NOT going to kill a VM that attempts migration.
> OK, but remember that I work on the basis that there are programming errors
> in both the migration code and the VMState descriptions for devices.
> If those break it still shouldn't kill the source.
> (Note this isn't just true of migration - we need to be careful about
> it in all cases where we're doing stuff to an otherwise happy VM).

While you can safely recover from certain programming errors, you can't
do it in general.  Worse, deciding whether recovery from a certain
programming error is safe can be intractable.

Example: visit_type_enum(v, name, &enum_val, enum_str, &err), where v is
an output visitor.  This can fail when enum_val is not a valid subscript
of enum_str[].  Can we recover safely?  Assume that we can cleanly fail
the task at hand at this point of its execution.

Perhaps enum_str[] doesn't match the actual enum.  This is a programming
error.  Failing the task is graceful degradation, and safe enough.

But what if enum_str[] is fine, but enum_val got corrupted?  Then
failing the task is still safe as long as enum_val isn't visible outside
the task.  But if it is visible, all bets are off.  The corruption can
spread, and do real damage.  Can be the difference between a crash that
forces a reboot with a filesystem journal replay, and massive data

So, should we try to recover here?  Assuming we want to, badly.  If
analysis shows the possible causes of this error are safely isolated by
the recovery, yes.  Without such analysis, the only prudent answer is

Real world examples typically deal with state more complex than just an
enum (all too often a thicket of pointers), and the safety argument gets
much hairier.

If you want more tractable arguments, try Erlang.

>> > * Conditions where the JSON output visitor itself sets an error:
>> > 
>> >   - None.
>> The JSON output visitor itself may be adding an error for an attempt to
>> output Inf or NaN for a floating point number - but since vmstate
>> doesn't use visit_type_number(), this is not possible.  And if we are
>> really worried about it, then in my next spin of the patch I may make it
>> user-configurable whether we stick to strict JSON or whether we relax
>> things and output Inf/NaN anyways.
> If that's the only case, and you're already saying it doesn't use it, then
> I don't see there's a point in making that bit any more configurable.

I listed all possible failures of the JSON output visitor upthread.
This is an additional failure we've considered.  I'm wary of adding it
precisely because I do worry about upsetting apple carts like this one.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]