qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v3 10/18] vmstate: Use new JSON output visitor


From: Markus Armbruster
Subject: Re: [Qemu-devel] [PATCH v3 10/18] vmstate: Use new JSON output visitor
Date: Wed, 04 May 2016 11:11:30 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux)

"Dr. David Alan Gilbert" <address@hidden> writes:

> * Markus Armbruster (address@hidden) wrote:
>> "Dr. David Alan Gilbert" <address@hidden> writes:
>> 
>> > * Eric Blake (address@hidden) wrote:
>> >
>> >> -static void vmstate_save_old_style(QEMUFile *f, SaveStateEntry *se, 
>> >> QJSON *vmdesc)
>> >> +static void vmstate_save_old_style(QEMUFile *f, SaveStateEntry *se,
>> >> +                                   Visitor *vmdesc)
>> >>  {
>> >>      int64_t old_offset, size;
>> >> +    const char *tmp;
>> >> 
>> >>      old_offset = qemu_ftell_fast(f);
>> >>      se->ops->save_state(f, se->opaque);
>> >>      size = qemu_ftell_fast(f) - old_offset;
>> >> 
>> >>      if (vmdesc) {
>> >> -        json_prop_int(vmdesc, "size", size);
>> >> -        json_start_array(vmdesc, "fields");
>> >> -        json_start_object(vmdesc, NULL);
>> >> -        json_prop_str(vmdesc, "name", "data");
>> >> -        json_prop_int(vmdesc, "size", size);
>> >> -        json_prop_str(vmdesc, "type", "buffer");
>> >> -        json_end_object(vmdesc);
>> >> -        json_end_array(vmdesc);
>> >> +        visit_type_int(vmdesc, "size", &size, &error_abort);
>> >> +        visit_start_list(vmdesc, "fields", NULL, 0, &error_abort);
>> >> +        visit_start_struct(vmdesc, NULL, NULL, 0, &error_abort);
>> >
>> > Please avoid error_abort in migration code, especially on the source side.
>> > You've got an apparently happily working VM, we must never kill it 
>> > while attempting migration.
>> 
>> These functions cannot fail,
>
> Hang on though - this takes a Visitor* - that could be any visitor and that
> could fail.

vmdesc is either NULL or it's the JSON output visitor created by
qemu_savevm_state_complete_precopy():

    vmdesc_jov = json_output_visitor_new(false);
    vmdesc = json_output_get_visitor(vmdesc_jov);

This is by design: the purpose of this code is *writing* a *JSON*
description of the migration stream.  See commit 8118f09.

>  and &error_abort is a concise way to
>> express that.  It's the same as
>> 
>>             visit_type_int(vmdesc, "size", &size, &err);
>>             assert(!err);
>> 
>> An alternative would be ignoring errors:
>> 
>>             visit_type_int(vmdesc, "size", &size, NULL);
>> 
>> Ignoring violations of design invariants is hardly ever a good idea,
>> though.
>> 
>> Another alternative would be trying to recover from the violation, like
>> this:
>> 
>>             visit_type_int(vmdesc, "size", &size, &err);
>>             if (err) {
>>                 report we're fscked...
>>                 do whatever needs to be done to recover...
>>                 goto out;
>>             }
>> 
>> Fancy untestable error paths are hardly ever good ideas, either.
>
> For an outgoing migration we must never kill the source unless we think the
> data structures the source is using are itself corrupt.
> We get programming errors both in our migration code and the migration
> structures on devices.
> If our migration code is broken/failing an invariant that still doesn't mean
> you should kill the source - it should kill the migration only.

"git-grep assert migration" suggests you do kill the source on certain
programming errors.

I reiterate my point that fancy, untestable error recovery is unlikely
to actually recover.  "Fancy" can work, "untestable" might work (but
color me skeptic), but once you got both, you're a dead man walking.

>> Complete list of conditions where the JSON output visitor sets an error:
>> 
>> * Conditions where the visitor core sets an error:
>> 
>>   - visit_type_uintN() when one of the visit_type_uint{8,16,32}() passes
>>     a value out of bounds.  This is a serious programming error in
>>     qapi-visit-core.c.  We're almost certainly screwed, and attempting
>>     to continue is unsafe.
>> 
>>   - visit_type_int(): likewise.
>> 
>>   - output_type_enum() when the numeric value is out of bounds.  This is
>>     either a serious programming error in qapi-visit-core.c, or
>>     corrupted state.  Either way, we're almost certainly screwed, and
>>     attempting to continue is unsafe.
>> 
>>   - input_type_enum() when the string value is unknown.  This is either
>>     a serious programming error in qapi-visit-core.c, or bad input.
>>     However, the JSON output visitor isn't supposed to ever call
>>     input_type_enum(), so it's the former.  Once again, we're almost
>>     certainly screwed, and attempting to continue is unsafe.
>> 
>> * Conditions where the JSON output visitor itself sets an error:
>> 
>>   - None.
>> 
>> Do you still object to &error_abort?
>
> So at the very least it should be commented as to why it can't happen.
> My worry about it is that you've got a fairly long comment about why
> it can't happen, and I worry that in 6 months someone adds a feature
> to either the visitors or the migration code that means there's now
> a case where it can happen.

Here's why I don't think new failure modes are likely.

What does this helper module do, and how could it possibly fail?  By
"possibly", I mean any conceivable reasonable implementation, not just
the two we have (this patch gets rid of one).

This helper module builds JSON text and returns it as a string.  Its
interface mirrors JSON abstract syntax: start object, end object, start
array, end array, string, ...  Additionally, initialize, finalize, get
the result as a string.

Conceivable failure modes:

* Out of memory.  We die, like we generally do for smallish allocations.

* Data not representable in JSON.  This is basically non-finite numbers,
  and we already chose to extend JSON instead of making this an error.
  Such a decision will not be revised without a thorough analysis of
  impact on existing users.

* Interface misused, e.g. invalid nesting.  Clearly a programming error.
  We can either silently produce garbage output, fail, or die.  Before
  the patch: garbage output.  After the patch: die by assertion failure
  (*not* via &error_abort).

* Anything else?

"Not via &error_abort" leads me to another point.  The &error_abort are
the assertions you can see in the patch.  The ones you can't see are in
the visitor core and the JSON output visitor.  They're all about misuse
of the interface.

The old code is different: it doesn't detect misuse, and produces
invalid JSON instead.  "Never check for an error you don't know how to
handle."

With the new code, misuse should be caught in general migration testing,
"make check" if it's any good.

With the old code, it could more easily escape testing, because you have
to parse the resulting JSON to detect it.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]