qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH RDMA support v5: 03/12] comprehensive protoc


From: Michael S. Tsirkin
Subject: Re: [Qemu-devel] [RFC PATCH RDMA support v5: 03/12] comprehensive protocol documentation
Date: Thu, 11 Apr 2013 10:19:27 +0300

On Wed, Apr 10, 2013 at 04:05:34PM -0400, Michael R. Hines wrote:
> On 04/10/2013 01:41 PM, Michael S. Tsirkin wrote:
> >>>>
> >>>>Thanks.
> >>>>
> >>>>However, IMHO restricting the policy to only used chunk-based is really
> >>>>not an acceptable choice:
> >>>>
> >>>>Here's the reason: Using my 10gbs RDMA hardware, throughput takes a
> >>>>dive from 10gbps to 6gbps.
> >>>Who cares about the throughput really? What we do care about
> >>>is how long the whole process takes.
> >>>
> >>Low latency and high throughput is very important =)
> >>
> >>Without these properties of RDMA, many workloads simply either
> >>take to long to finish migrating or do not converge to a stopping
> >>point altogether.
> >>
> >>*Not making this a configurable option would defeat the purpose of
> >>using RDMA altogether.
> >>
> >>Otherwise, you're no better off than just using TCP.
> >So we have two protocols implemented: one is slow the other pins all
> >memory on destination indefinitely.
> >
> >I see two options here:
> >- improve the slow version so it's fast, drop the pin all version
> >- give up and declare RDMA requires pinning all memory on destination
> >
> >But giving management a way to do RDMA at the speed of TCP? Why is this
> >useful?
> 
> This is "useful" because of the overcommit concerns you brought
> before, which is the reason why I volunteered to write dynamic
> server registration in the first place. We never required that overcommit
> and performance had
> 
> From prior experience, I don't believe overcommit and good performance
> are compatible with each other in general (i.e. using compression,
> page sharing, etc, etc.), but that's a debate for another day =)

Maybe we should just say "RDMA is incompatible with memory overcommit"
and be done with it then. But see below.

> I would like to propose a compromise:
> 
> How about we *keep* the registration capability and leave it enabled
> by default?
> 
> This gives management tools the ability to get performance if they want to,
> but also satisfies your requirements in case management doesn't know the
> feature exists - they will just get the default enabled?

Well unfortunately the "overcommit" feature as implemented seems useless
really.  Someone wants to migrate with RDMA but with low performance?
Why not migrate with TCP then?

> >>But the problem is more complicated than that: there is no coordination
> >>between the migration_thread and RDMA right now because Paolo is
> >>trying to maintain a very clean separation of function.
> >>
> >>However we *can* do what you described in a future patch like this:
> >>
> >>1. Migration thread says "iteration starts, how much memory is dirty?"
> >>2. RDMA protocol says "Is there a lot of dirty memory?"
> >>         OK, yes? Then batch all the registration messages into a
> >>single request
> >>         but do not write the memory until all the registrations have
> >>completed.
> >>
> >>         OK, no?  Then just issue registrations with very little
> >>batching so that
> >>                       we can quickly move on to the next iteration round.
> >>
> >>Make sense?
> >Actually, I think you just need to get a page from migration core and
> >give it to the FSM above.  Then let it give you another page, until you
> >have N pages in flight in the FSM all at different stages in the
> >pipeline.  That's the theory.
> >
> >But if you want to try changing management core, go wild.  Very little
> >is written in stone here.
> 
> The FSM and what I described are basically the same thing, I just
> described it more abstractly than you did.

Yes but I'm saying it can be part of RDMA code, no strict need to
change anything else.

> Either way, I agree that the optimization would be very useful,
> but I disagree that it is possible for an optimized registration algorithm
> to perform *as well as* the case when there is no dynamic
> registration at all.
> 
> The point is that dynamic registration *only* helps overcommitment.
> 
> It does nothing for performance - and since that's true any optimizations
> that improve on dynamic registrations will always be sub-optimal to turning
> off dynamic registration in the first place.
> 
> - Michael

So you've given up on it.  Question is, sub-optimal by how much?  And
where's the bottleneck?

Let's do some math. Assume you send 16 bytes registration request and
get back a 16 byte response for each 4Kbyte page (16 bytes enough?).  That's
32/4096 < 1% transport overhead. Negligeable.

Is it the source CPU then? But CPU on source is basically doing same
things as with pre-registration: you do not pin all memory on source.

So it must be the destination CPU that does not keep up then?
But it has to do even less than the source CPU.

I suggest one explanation: the protocol you proposed is inefficient.
It seems to basically do everything in a single thread:
get a chunk,pin,wait for control credit,request,response,rdma,unpin,
There are two round-trips of send/receive here where you are not
going anything useful. Why not let migration proceed?

Doesn't all of this sound worth checking before we give up?

-- 
MST



reply via email to

[Prev in Thread] Current Thread [Next in Thread]