qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] New Migration Protocol using Visitor Interface


From: Michael S. Tsirkin
Subject: Re: [Qemu-devel] [RFC] New Migration Protocol using Visitor Interface
Date: Mon, 3 Oct 2011 16:41:09 +0200
User-agent: Mutt/1.5.21 (2010-09-15)

On Mon, Oct 03, 2011 at 08:51:10AM -0500, Anthony Liguori wrote:
> On 10/03/2011 08:38 AM, Michael S. Tsirkin wrote:
> >On Mon, Oct 03, 2011 at 07:55:48AM -0500, Anthony Liguori wrote:
> >>On 10/02/2011 04:08 PM, Michael S. Tsirkin wrote:
> >>>On Sun, Oct 02, 2011 at 04:21:47PM -0400, Stefan Berger wrote:
> >>>>
> >>>>>4) Implement the BERVisitor and make this the default migration protocol.
> >>>>>
> >>>>>Most of the work will be in 1), though with the implementation in this 
> >>>>>series we should be able to do it incrementally. I'm not sure if the 
> >>>>>best approach is doing the mechanical phase 1 conversion, then doing 
> >>>>>phase 2 sometime after 4), doing phase 1 + 2 as part of 1), or just 
> >>>>>doing VMState conversions which gives basically the same capabilities as 
> >>>>>phase 1 + 2.
> >>>>>
> >>>>>Thoughts?
> >>>>Is anyone working on this? If not I may give it a shot (tomorrow++)
> >>>>for at least some of the primitives... for enabling vNVRAM metadata
> >>>>of course. Indefinite length encoding of constructed data types I
> >>>>suppose won't be used otherwise the visitor interface seems wrong
> >>>>for parsing and skipping of extra data towards the end of a
> >>>>structure if version n wrote the stream and appended some of its
> >>>>version n data and now version m<   n is trying to read the struct
> >>>>and needs to skip the version [m+1, n ] data fields ... in that case
> >>>>the de-serialization of the stream should probably be stream-driven
> >>>>rather than structure-driven.
> >>>>
> >>>>    Stefan
> >>>
> >>>Yes I've been struggling with that exactly.
> >>>Anthony, any thoughts?
> >>
> >>It just depends on how you write your visitor.  If you used
> >>sequences, you'd probably do something like this:
> >>
> >>start_struct ->
> >>   check for sequence tag, push starting offset and size onto stack
> >>   increment offset to next tag
> >>
> >>type_int (et al) ->
> >>   check for explicit type, parse data
> >>   increment offset to next tag
> >>
> >>end_struct ->
> >>   pop starting offset and size to temp variables
> >>   set offset to starting offset + size
> >>
> >>This is roughly how the QMP input marshaller works FWIW.
> >>
> >>Regards,
> >>
> >>Anthony Liguori
> >
> >One thing I worry about is enabling zero copy for
> >large string types (e.g. memory migration).
> 
> Memory shouldn't be done through Visitors.  It should be handled as a special 
> case.

OK, that's fine then.

> >So we need to be able to see a tag for memory page + address,
> >read that from socket directly at the correct virtual address.
> >
> >Probably, we can avoid using visitors for memory, and hope
> >everything else can stand an extra copy since it's small.
> >
> >But then, why do we worry about the size of
> >encoded device state as Anthony seems to do?
> 
> There's a significant difference between the cost of something on
> the wire and the cost of doing a memcpy.  The cost of the data on
> the wire is directly proportional to downtime.  So if we increase
> the size of the device state by a factor of 10, we increase the
> minimum downtime by a factor of 10.
> 
> Of course, *if* the size of device state is already negligible with
> respect to the minimum downtime, then it doesn't matter.  This is
> easy to quantify though.  For a normal migration session today,
> what's the total size of the device state in relation to the
> calculated bandwidth of the minimum downtime?
> 
> If it's very small, then we can add names and not worry about it.
> 
> Regards,
> 
> Anthony Liguori

Yes, it's easy to quantify. I think the following gives us
the offset before and after, so the difference is the size
we seek, right?


diff --git a/savevm.c b/savevm.c
index 1feaa70..dbbbcc6 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1543,6 +1543,7 @@ int qemu_savevm_state_iterate(Monitor *mon, QEMUFile *f)
 int qemu_savevm_state_complete(Monitor *mon, QEMUFile *f)
 {
     SaveStateEntry *se;
+    unsigned long long vm_state_size;
 
     cpu_synchronize_all_states();
 
@@ -1557,6 +1558,8 @@ int qemu_savevm_state_complete(Monitor *mon, QEMUFile *f)
         se->save_live_state(mon, f, QEMU_VM_SECTION_END, se->opaque);
     }
 
+    vm_state_size = qemu_ftell(f);
+    fprintf(stderr, "start size: %lld\n", vm_state_size);
     QTAILQ_FOREACH(se, &savevm_handlers, entry) {
         int len;
 
@@ -1577,6 +1580,8 @@ int qemu_savevm_state_complete(Monitor *mon, QEMUFile *f)
 
         vmstate_save(f, se);
     }
+    vm_state_size = qemu_ftell(f);
+    fprintf(stderr, "end size: %lld\n", vm_state_size);
 
     qemu_put_byte(f, QEMU_VM_EOF);
 

-- 
MST



reply via email to

[Prev in Thread] Current Thread [Next in Thread]