Re: [Qemu-devel] [RFC PATCH RDMA support v5: 03/12] comprehensive protoc

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH RDMA support v5: 03/12] comprehensive protoc

From:	Michael R. Hines
Subject:	Re: [Qemu-devel] [RFC PATCH RDMA support v5: 03/12] comprehensive protocol documentation
Date:	Mon, 15 Apr 2013 09:24:19 -0400
User-agent:	Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130106 Thunderbird/17.0.2

On 04/15/2013 04:34 AM, Paolo Bonzini wrote:

Il 15/04/2013 03:10, Michael R. Hines ha scritto:

And when someone writes them one day, we'll have to carry the old code
around for interoperability as well. Not pretty.  To avoid that, you
need to explicitly say in the documenation that it's experimental and
unsupported.

That's what protocols are for.

As I've already said, I've incorporated this into the design of the
protocol
already.

The protocol already has a field called "repeat" which allows a user to
request multiple chunk registrations at the same time.

If you insist, I can add a capability / command to the protocol called
"unregister chunk",
but I'm not volunteering to implement that command as I don't have any data
showing it to be of any value.

Implementing it on the destination side would be of value because it
would make the implementation interoperable.

A very basic implementation would be "during the bulk phase, unregister
the previous chunk every time you register a chunk".  It would work
great when migrating an idle guest, for example.  It would probably be
faster than TCP (which is now at 4.2 Gbps).

On one hand this should not block merging the patches; on the other
hand, "agreeing to disagree" without having done any test is not very
fruitful.  You can disagree on the priorities (and I agree with you on
this), but what mst is proposing is absolutely reasonable.

Paolo


Ok, I think I understand the disconnect here: So, let's continue to use
the above example that you described and let me ask another question.

Let's say the above mentioned idle VM is chosen, for whatever reason,
*not* to use TCP migration, and instead use RDMA. (I recommend against
choosing RDMA in the current docs, but let's stick to this example for
the sake of argument).

Now, in this example, let's say the migration starts up and the hypervisor
has run out of physical memory and starts swapping during the migration.
(also for the sake of argument).

The next thing that would immediately happen is the
next IB verbs function call: "ib_reg_mr()".

This function call would probably fail because there's nothing else leftto pin

and the function call would return an error.

So my question is: Is it not sufficient to send a message back to theprimary-VM

side of the connection which says:

"Your migration cannot proceed anymore, please resume the VM and tryagain somewhere else".

In this case, both the system administrator and the virtual machine aresafe,

nothing has been killed, nothing has crashed, and the management software
can proceed to make a new management decision.

Is there something wrong with this sequence of events?

- Michael

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] [RFC PATCH RDMA support v5: 03/12] comprehensive protocol documentation, (continued)

Prev by Date: [Qemu-devel] [PATCH 06/12] acpi.h: make it self contained
Next by Date: [Qemu-devel] [PATCH 04/12] hw: Add lost ARM core again
Previous by thread: Re: [Qemu-devel] [RFC PATCH RDMA support v5: 03/12] comprehensive protocol documentation
Next by thread: Re: [Qemu-devel] [RFC PATCH RDMA support v5: 03/12] comprehensive protocol documentation
Index(es):
- Date
- Thread