qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH RDMA support v5: 03/12] comprehensive protoc


From: Michael R. Hines
Subject: Re: [Qemu-devel] [RFC PATCH RDMA support v5: 03/12] comprehensive protocol documentation
Date: Wed, 10 Apr 2013 16:05:34 -0400
User-agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130106 Thunderbird/17.0.2

On 04/10/2013 01:41 PM, Michael S. Tsirkin wrote:

Thanks.

However, IMHO restricting the policy to only used chunk-based is really
not an acceptable choice:

Here's the reason: Using my 10gbs RDMA hardware, throughput takes a
dive from 10gbps to 6gbps.
Who cares about the throughput really? What we do care about
is how long the whole process takes.

Low latency and high throughput is very important =)

Without these properties of RDMA, many workloads simply either
take to long to finish migrating or do not converge to a stopping
point altogether.

*Not making this a configurable option would defeat the purpose of
using RDMA altogether.

Otherwise, you're no better off than just using TCP.
So we have two protocols implemented: one is slow the other pins all
memory on destination indefinitely.

I see two options here:
- improve the slow version so it's fast, drop the pin all version
- give up and declare RDMA requires pinning all memory on destination

But giving management a way to do RDMA at the speed of TCP? Why is this
useful?

This is "useful" because of the overcommit concerns you brought
before, which is the reason why I volunteered to write dynamic
server registration in the first place. We never required that overcommit
and performance had

From prior experience, I don't believe overcommit and good performance
are compatible with each other in general (i.e. using compression,
page sharing, etc, etc.), but that's a debate for another day =)

I would like to propose a compromise:

How about we *keep* the registration capability and leave it enabled by default?

This gives management tools the ability to get performance if they want to,
but also satisfies your requirements in case management doesn't know the
feature exists - they will just get the default enabled?
But the problem is more complicated than that: there is no coordination
between the migration_thread and RDMA right now because Paolo is
trying to maintain a very clean separation of function.

However we *can* do what you described in a future patch like this:

1. Migration thread says "iteration starts, how much memory is dirty?"
2. RDMA protocol says "Is there a lot of dirty memory?"
         OK, yes? Then batch all the registration messages into a
single request
         but do not write the memory until all the registrations have
completed.

         OK, no?  Then just issue registrations with very little
batching so that
                       we can quickly move on to the next iteration round.

Make sense?
Actually, I think you just need to get a page from migration core and
give it to the FSM above.  Then let it give you another page, until you
have N pages in flight in the FSM all at different stages in the
pipeline.  That's the theory.

But if you want to try changing management core, go wild.  Very little
is written in stone here.

The FSM and what I described are basically the same thing, I just
described it more abstractly than you did.

Either way, I agree that the optimization would be very useful,
but I disagree that it is possible for an optimized registration algorithm
to perform *as well as* the case when there is no dynamic registration at all.

The point is that dynamic registration *only* helps overcommitment.

It does nothing for performance - and since that's true any optimizations
that improve on dynamic registrations will always be sub-optimal to turning
off dynamic registration in the first place.

- Michael




reply via email to

[Prev in Thread] Current Thread [Next in Thread]