Re: [Qemu-devel] [RFC 00/13] Multiple fd migration support

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC 00/13] Multiple fd migration support

From:	Dr. David Alan Gilbert
Subject:	Re: [Qemu-devel] [RFC 00/13] Multiple fd migration support
Date:	Tue, 26 Apr 2016 13:38:04 +0100
User-agent:	Mutt/1.5.24 (2015-08-30)

* Juan Quintela (address@hidden) wrote:
> "Dr. David Alan Gilbert" <address@hidden> wrote:
> > * Juan Quintela (address@hidden) wrote:
> >> Hi
> >> 
> >> This patch series is "an" initial implementation of multiple fd migration.
> >> This is to get something out for others to comment, it is not finished at 
> >> all.
> >
> > I've had a quick skim:
> >   a) I think mst is right about the risk of getting stale pages out of 
> > order.
> 
> I have been thinking about this.  We just need to send a "we have
> finish" this round packet.  And reception has to wait for all threads to
> finish before continue.  It is easier and not expensive.  We never
> resend the same page during the same round. 

Yes.

> >   b) Since you don't change the URI at all, it's a bit restricted; for 
> > example,
> >      it means I can't run separate sessions over different NICs unless I've
> >      done something clever at the routing/or bonded them.
> >      One thing I liked the sound of multi-fd for is NUMA; get a BIG box
> >      and give each numa node a separate NIC and run a separate thread on 
> > each
> >      node.
> 
> If we want this _how_ we want to configure it.  This was part of the
> reason to post the patch.  It works only for tcp, I don't even try the
> others, just to see what people want.

I was thinking this would work even for TCP; you'd just need a way to pass
different URIs (with address/port) for each connection.

> >   c) Hmm we do still have a single thread doing all the bitmap syncing and 
> > scanning,
> >      we'll have to watch out if that is the bottleneck at all.
> 
> Yeap.  My idea here was to still maintain the bitmap scanning on the
> main thread, but send work to the "worker threads" in batches, not in
> single pages.  But I haven't really profiled how long we spend there.

Yeh, it would be interesting to see what this profile looked like; if we
suddenly found that main thread had spare cycles perhaps we could do some
more interesting types of scanning.

> >   d) All the zero testing is still done in the main thread which we know is
> >      expensive.
> 
> Not trivial if we don't want to send control information over the
> "other" channels.  One solution would be split the main memory in
> different "main" threads.  No performance profiles.

Yes, and it's tricky because the order is:
   1) Send control information
   2) Farm it out to individual thread

  It's too late for '2' to say 'it's zero'.

> >   e) Do we need to do something for security with having multiple ports? How
> >      do we check that nothing snuck in on one of our extra ports, have we 
> > got
> >      sanity checks to make sure it's actually the right stream.
> 
> 
> We only have a single port.  We opened it several times.  It shouldn't
> require changes in either libvirt/firewall.  (Famous last words)

True I guess.

> 
> >   f) You're handing out pages to the sending threads on the basis of which 
> > one
> >      is free (in the same way as the multi threaded compression); but I 
> > think
> >      it needs some sanity adding to only hand out whole host pages - it 
> > feels
> >      like receiving all the chunks of one host page down separate FDs would
> >      be horrible.
> 
> Trivial optimization would be to send _whole_ huge pages in one go.  I
> wanted comments about what people wanted here.  My idea was really to
> add multipage or several pages in one go.  Would reduce synchronization
> a lot.   I do to the 1st that becomes free because ...... I don't know
> how long a specific transmission is going to take.  TCP for you :-(

Sending huge pages would be very nice; the tricky thing is you don't want to 
send
a huge page unless it's all marked dirty.

> >   g) I think you might be able to combine the compression into the same 
> > threads;
> >      so that if multi-fd + multi-threaded-compresison is set you don't end
> >      up with 2 sets of threads and it might be the simplest way to make them
> >      work together.
> 
> Yeap, I thought that.  But I didn't want to merge them in a first
> stage.  It makes much more sense to _not_ send the compressed data
> through the main channel.  But that would be v2 (or 3, or 4 ...)

Right.

> >   h) You've used the last free RAM_SAVE_FLAG!  And the person who takes the 
> > last
> >      slice^Wbit has to get some more.
> >      Since arm, ppc, and 68k have variants that have TARGET_PAGE_BITS 10  
> > that
> >      means we're full; I suggest what you do is use that flag to mean that 
> > we
> >      send another 64bit word; and in that word you use the bottom 7 bits for
> >      the fd index and bit 7 is set to indicate it's fd.  The other bits are 
> > sent
> >      as zero and available for the next use.
> >      Either that or start combining with some other flags.
> >      (I may have a use for some more bits in mind!)
> 
> Ok.  I can looke at that.
> 
> >   i) Is this safe for xbzrle - what happens to the cache (or is it all
> >      still the main thread?)
> 
> Nope.  Only way to use xbzrle is:
> 
> if (zero(page) {
>    ...
> } else if (xbzrle(page)) {
> 
> } else {
>     multifd(page)
> }
> 
> Otherwise we would have to make xbzrle multithread, or split memory
> between fd's.  Problem to split memory between fd's is that we need to
> know where the hot spots are.

OK, that makes sense.  So does that mean that some pages can get xbzrle sent?

> >   j) For postcopy I could do with a separate fd for the requested pages
> >      (but again that comes back to needing an easy solution to the ordering)
> 
> The ordering was easy, as said.  You can just use that command with each
> postcopy requested page.  Or something similar, no?
> 
> I think that just forgetting about that pages, and each time that we
> receive a requested page, we first wait for the main thread to finish
> its pages should be enough, no?

Actually, I realised it's simpler;  once we're in postcopy mode we never
send the same page again; so we never have any ordering problems as long
as we perform a sync across the fd's at postcopy entry.

Dave

> 
> > Dave
> 
> Thanks very much, JUan.
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] [PATCH 05/13] migration: Create x-multifd-threads parameter, (continued)
- [Qemu-devel] [PATCH 08/13] migration: create ram_multifd_page, Juan Quintela, 2016/04/20
- [Qemu-devel] [PATCH 10/13] migration: Send the fd number which we are going to use for this page, Juan Quintela, 2016/04/20
- [Qemu-devel] [PATCH 12/13] migration: Test new fd infrastructure, Juan Quintela, 2016/04/20
- [Qemu-devel] [PATCH 13/13] migration: [HACK]Transfer pages over new channels, Juan Quintela, 2016/04/20
  - Re: [Qemu-devel] [PATCH 13/13] migration: [HACK]Transfer pages over new channels, Dr. David Alan Gilbert, 2016/04/22
- [Qemu-devel] [PATCH 11/13] migration: Create thread infrastructure for multifd recv side, Juan Quintela, 2016/04/20
- Re: [Qemu-devel] [RFC 00/13] Multiple fd migration support, Michael S. Tsirkin, 2016/04/20
- Re: [Qemu-devel] [RFC 00/13] Multiple fd migration support, Dr. David Alan Gilbert, 2016/04/22
  - Re: [Qemu-devel] [RFC 00/13] Multiple fd migration support, Juan Quintela, 2016/04/25
    - Re: [Qemu-devel] [RFC 00/13] Multiple fd migration support, Dr. David Alan Gilbert <=

Prev by Date: Re: [Qemu-devel] Updating documentation at http://wiki.qemu.org/download/qemu-doc.html
Next by Date: Re: [Qemu-devel] [PATCH] hw/net/virtio-net: Allocating Large sized arrays to heap
Previous by thread: Re: [Qemu-devel] [RFC 00/13] Multiple fd migration support
Next by thread: [Qemu-devel] [PATCH] MAINTAINERS: Avoid using K: for NUMA section
Index(es):
- Date
- Thread