qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Time to introduce a migration protocol negotiation (Re: [PATCH v2 00


From: Dr. David Alan Gilbert
Subject: Re: Time to introduce a migration protocol negotiation (Re: [PATCH v2 00/25] migration: Postcopy Preemption)
Date: Tue, 15 Mar 2022 10:43:45 +0000
User-agent: Mutt/2.1.5 (2021-12-30)

* Daniel P. Berrangé (berrange@redhat.com) wrote:
> On Tue, Mar 01, 2022 at 04:51:09PM +0000, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > On Tue, Mar 01, 2022 at 10:27:10AM +0000, Daniel P. Berrangé wrote:
> > > > > I also didn't know whether there's other limitations of it.  For 
> > > > > example,
> > > > > will a new socket pair be a problem for any VM environment (either a
> > > > > limitation from the management app, container, and so on)?  I think 
> > > > > it's
> > > > > the same to multifd in that aspect, but I never explored.
> > > > 
> > > > If it needs extra sockets that is something apps will need to be aware
> > > > of unfortunately and explicitly opt-in to :-( Migration is often
> > > > tunnelled/proxied over other channels, so whatever does that needs to
> > > > be aware of possibility of seeing extra sockets.
> > > 
> > > Ah, then probably it can never be the default.  But for sure it could be
> > > nice that higher level can opt-in and make it a default at some point as
> > > long as it knows the network topology is safe to do so.
> > > 
> > > > 
> > > > > > > TODO List
> > > > > > > =========
> > > > > > > 
> > > > > > > TLS support
> > > > > > > -----------
> > > > > > > 
> > > > > > > I only noticed its missing very recently.  Since soft freeze is 
> > > > > > > coming, and
> > > > > > > obviously I'm still growing this series, so I tend to have the 
> > > > > > > existing
> > > > > > > material discussed. Let's see if it can still catch the train for 
> > > > > > > QEMU 7.0
> > > > > > > release (soft freeze on 2022-03-08)..
> > > > > > 
> > > > > > I don't like the idea of shipping something that is only half 
> > > > > > finished.
> > > > > > It means that when apps probe for the feature, they'll see preempt
> > > > > > capability present, but have no idea whether they're using a QEMU 
> > > > > > that
> > > > > > is broken when combined with TLS or not. We shouldn't merge 
> > > > > > something
> > > > > > just to meet the soft freeze deadline if we know key features are 
> > > > > > broken.
> > > > > 
> > > > > IMHO merging and declaring support are two problems.
> > > > > 
> > > > > To me, it's always fine to merge the code that implemented the 
> > > > > fundation of a
> > > > > feature.  The feature can be worked upon in the future.
> > > > > 
> > > > > Requiring a feature to be "complete" sometimes can cause burden to 
> > > > > not only
> > > > > the author of the series but also reviewers.  It's IMHO not necessary 
> > > > > to
> > > > > bind these two ideas.
> > > > > 
> > > > > It's sometimes also hard to define "complete": take the TLS as 
> > > > > example, no
> > > > > one probably even noticed that it won't work with TLS and I just 
> > > > > noticed it
> > > > > merely these two days..  We obviously can't merge partial patchset, 
> > > > > but if
> > > > > the patchset is well isolated, then it's not a blocker for merging, 
> > > > > imho.
> > > > > 
> > > > > Per my understanding, what you worried is when we declare it 
> > > > > supported but
> > > > > later we never know when TLS will be ready for it.  One solution is I 
> > > > > can
> > > > > rename the capability as x-, then after the TLS side ready I drop the 
> > > > > x-
> > > > > prefix.  Then Libvirt or any mgmt software doesn't need to support 
> > > > > this
> > > > > until we drop the x-, so there's no risk of compatibility.
> > > > > 
> > > > > Would that sound okay to you?
> > > > 
> > > > If it has an x- prefix then we can basically ignore it from a mgmt app
> > > > POV until it is actually finished.
> > > > 
> > > > > I can always step back and work on TLS first before it's merged, but 
> > > > > again
> > > > > I don't think it's required.
> > > > 
> > > > Apps increasingly consider use of TLS to be a mandatory feature for
> > > > migration, so until that works, this preempt has to be considered
> > > > unsupported & unfinished IMHO. So either TLS should be ready when
> > > > it merges, or it should be clearly marked unsupported at the QAPI
> > > > level.
> > > 
> > > Yes, I fully agree with it, and for huge vm migrations I think TLS is in
> > > many cases mandatory.
> > > 
> > > I do plan to work on it right afterwards if this series land, but as the
> > > series grows I just noticed maybe we should start landing some codes 
> > > that's
> > > already solid.  Landing the code as another benefit that I want to make
> > > sure the code merged at least won't affect the existing features.
> > > 
> > > So what I'm curious is why TLS is getting quite some attentions in the 
> > > past
> > > few years but I didn't even see any selftests included in migration-test 
> > > on
> > > tls.  That's something I wanted to look into, maybe even before adding the
> > > preempt+tls support. But maybe I just missed something, as I didn't use 
> > > tls
> > > a lot in the past.
> > 
> > Hmm, I think it's worth getting TLS working before putting the full
> > series in, because it might impact the way you wire the channels up -
> > it's going to take some care; but lets see which parts we can/should
> > take.
> 
> Taking a step back here and looking at the bigger picture of
> migration protocol configuration....

OK, but lets keep that as a separate discussion and not bog down Peter's
improvement in a different overhaul.

> Almost every time we add a new feature to migration, we end up
> having to define at least one new migration parameter, then wire
> it up in libvirt, and then the mgmt app too, often needing to
> ensure it is turn on for both client and server at the same time.
> 
> 
> For some features, requiring an explicit opt-in could make sense,
> because we don't know for sure that the feature is always a benefit.
> These are things that can be thought of as workload sensitive
> tunables.
> 
> 
> For other features though, it feels like we would be better off if
> we could turn it on by default with no config. These are things
> that can be thought of as migration infrastructre / transport
> architectural designs.
> 
> 
> eg it would be nice to be able to use multifd by default for
> migration. We would still want a tunable to control the number
> of channels, but we ought to be able to just start with a default
> number of channels automatically, so the tunable is only needed
> for special cases.

Right, I agree in part - but we do need those tunables to exist; we rely
on being able to turn things on or off, or play with the tunables
to debug and get performance.  We need libvirt to enumerate the tunables
from qemu rather than having to add code to libvirt every time.
They're all in QAPI definitions anyway - libvirt really shouldn't be
adding code each time.   Then we could have a  virsh migrate --tunable
rather than having loads of extra options which all have different names
from qemu's name for the same feature.

> This post-copy is another case.  We should start off knowing
> we can switch to post-copy at any time. We should further be
> able to add pre-emption if we find it available. IOW, we should
> not have required anything more than 'switch to post-copy' to
> be exposed to mgmtm apps.

Some of these things are tricky; for example knowing whether or not you
can do postcopy depends on your exact memory configuration; some of that
is tricky to probe.

> Or enabling zero copy on either send or receive side.
> 
> Or enabling kernel-TLS offload

Will kernel-TLS be something you'd want to automatically turn on?
We don't know yet whether it's a good idea if you don't have hardware
support.

> Or ..insert other interesting protocol feature...
> 
> 
> 
> All this stems from our current migration protocol that started
> as a single unidirectional channel, which goes straight into
> the migration data stream, with no protocol handshake  and
> thus no feature negotiation either.
> 
> We've offloaded feature negotiation to libvirt and in turn to
> the mgmt app and this is awful, for thue layers above, but
> also awful for QEMU. Because multifd requires mgmt app opt-in,
> we can wait 10 years and there will still be countless apps
> using single-fd mode because they've not been updated to
> opt-in.  If we negotiated features at QEMU level we could
> have everything using multifd in a few years, and have dropped
> single-fd mode a few years later.
> 
> 
> So rather than following our historical practice, anjd adding
> yet another migration parameter for a specific feature, I'd
> really encourage us to put a stop to it and future proof
> ourselves.
> 
> 
> Introduce one *final-no-more-never-again-after-this* migration
> capability called "protocol-negotiation".
> 
> 
> When that capability is set, first declare that henceforth the
> migration transport is REQUIRED to support **multiple**,
> **bi-directional** channels. We might only use 1 TCP channel
> in some cases, but it declares our intent that we expect to be
> able to use as many channels as we see fit henceforth.
> 
> Now define a protocol handshake. A 5 minute thought experiment
> starts off with something simple:
> 
>    dst -> src:  Greeting Message:
>                   Magic: "QEMU-MIGRATE"  12 bytes
>                   Num Versions: 1 byte
>                   Version list: 1 byte * num versions
>                   Num features: 4 bytes
>                   Feature list: string * num features
> 
>    src -> dst:  Greeting Reply:
>                   Magic: "QEMU-MIGRATE" 12 bytes
>                   Select version: 1 byte
>                   Num select features: 4 bytes
>                   Selected features: string * num features   
> 
>    .... possibly more src <-> dst messages depending on
>         features negotiated....
> 
>    src -> dst:  start migration
>  
>     ...traditional migration stream runs now for the remainder
>        of this connection ...

Don't worry about designing the bytes; we already have a command
structure; we just need to add a MIG_CMD_FEATURES and a 
MIG_RP_MSG_FEATURES
(I'm not sure what we need to do for RDMA; or what we do for exec: or
savevm)

> I suggest "dst" starts first, so that connecting to a dst lets you
> easily debug whether QEMU is speaking v2 or just waiting for the
> client to send something as traditionally the case.
> 
> This shouldn't need very much code, and it gives us flexibility
> to do all sorts of interesting things going forward with less
> overhead for everyone involved.
> 
> We can layer in a real authentication system like SASL after
> the greeting without any libvirt / mgmt app support
> 
> We can enable zero-copy at will. We can enable kernel-TLS at
> will. We can add new TCP connections for clever feature XYZ.
> 
> We get a back channel every time, so dst can pass info back
> to the src to optimize behaviour.
> 
> We can experiment with features and throw them away again
> later without involving the mgmt app, since we negotiate
> their use.
> 
> 
> With regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK




reply via email to

[Prev in Thread] Current Thread [Next in Thread]