[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH] vhost: fix a migration failed because of vhost
From: |
Michael S. Tsirkin |
Subject: |
Re: [Qemu-devel] [PATCH] vhost: fix a migration failed because of vhost region merge |
Date: |
Sat, 22 Jul 2017 00:30:17 +0300 |
On Fri, Jul 21, 2017 at 04:41:58PM +0200, Igor Mammedov wrote:
> On Wed, 19 Jul 2017 18:52:56 +0300
> "Michael S. Tsirkin" <address@hidden> wrote:
>
> > On Wed, Jul 19, 2017 at 03:24:27PM +0200, Igor Mammedov wrote:
> > > On Wed, 19 Jul 2017 12:46:13 +0100
> > > "Dr. David Alan Gilbert" <address@hidden> wrote:
> > >
> > > > * Igor Mammedov (address@hidden) wrote:
> > > > > On Wed, 19 Jul 2017 23:17:32 +0800
> > > > > Peng Hao <address@hidden> wrote:
> > > > >
> > > > > > When a guest that has several hotplugged dimms is migrated, in
> > > > > > destination host it will fail to resume. Because vhost regions of
> > > > > > several dimms in source host are merged and in the restore stage
> > > > > > in destination host it computes whether more than vhost slot limit
> > > > > > before merging vhost regions of several dimms.
> > > > > could you provide a bit more detailed description of the problem
> > > > > including command line+used device_add commands on source and
> > > > > command line on destination?
> > > >
> > > > (ccing in Marc Andre and Maxime)
> > > >
> > > > Hmm, I'd like to understade the situation where you get merging between
> > > > RAMBlocks; that complicates some stuff for postcopy.
> > > and probably inconsistent merging breaks vhost as well
> > >
> > > merging might happen if regions are adjacent or overlap
> > > but for that to happen merged regions must have equal
> > > distance between their GPA:HVA pairs, so that following
> > > translation would work:
> > >
> > > if gva in regionX[gva_start, len, hva_start]
> > > hva = hva_start + gva - gva_start
> > >
> > > while GVA of regions is under QEMU control and deterministic
> > > HVA is not, so in migration case merging might happen on source
> > > side but not on destination, resulting in different memory maps.
> > >
> > > Maybe Michael might know details why migration works in vhost usecase,
> > > but I don't see vhost sending any vmstate data.
> >
> > We aren't merging ramblocks at all.
> > When we are passing blocks A and B to vhost, if we see that
> >
> > hvaB=hvaA + lenA
> > gpaB=gpaA + lenA
> >
> > then we can improve performance a bit by passing a single
> > chunk to vhost: hvaA,gpaA,lena+lenB
> kernel used to maintain flat array map for look up where
> such optimization could give some benefit which is negligible
> as in practice merging reduces array size only by ~5 entries.
>
> In addition kernel backend has been converted to interval tree
> as flat array doesn't scale, so merging doesn't really matters
> there anymore.
In my opinion not merging slots is an obvious waste - I
think there were patches that added a cache and that
showed some promise. cache will be more effective
if regions are bigger.
> If we can get rid of merging on QEMU side, resulting memory
> map will become of the same size regardless of the order
> in which entries are added or chancy random allocation
> that could allow region merging (i.e. size will become
> deterministic).
It seems somehow wrong to avoid doing (even minor) optimizations just to
make error handling simpler.
> Looking at vhost_user_set_mem_table() it sends actual number of
> entries to backend over the wire, so it shouldn't break backend
> if it were written right (i.e. uses msg.payload.memory.nregions
> instead of VHOST_MEMORY_MAX_NREGIONS from QEMU.), if it breaks
> then it's backend's fault and it should be fixed.
>
> Another thing that could break is too low limit
> VHOST_MEMORY_MAX_NREGIONS = 8
> and QEMU started with default options takes upto 7 entries in map
> unmerged, so any configuration that consumes additional slots won't
> start after upgrade. We could counter the most of issues by rising
> VHOST_MEMORY_MAX_NREGIONS limit and/or teaching vhost-user protocol
> to fetch limit from backend similar to vhost_kernel_memslots_limit().
I absolutely agree we should fix vhost-user to raise the slot
limit, along the lines you suggest. Care looking into it?
>
> > so it does not affect migration normally.
> >
> > >
> > > >
> > > > > >
> > > > > > Signed-off-by: Peng Hao <address@hidden>
> > > > > > Signed-off-by: Wang Yechao <address@hidden>
> > > > > > ---
> > > > > > hw/mem/pc-dimm.c | 2 +-
> > > > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
> > > > > > index ea67b46..bb0fa08 100644
> > > > > > --- a/hw/mem/pc-dimm.c
> > > > > > +++ b/hw/mem/pc-dimm.c
> > > > > > @@ -101,7 +101,7 @@ void pc_dimm_memory_plug(DeviceState *dev,
> > > > > > MemoryHotplugState *hpms,
> > > > > > goto out;
> > > > > > }
> > > > > >
> > > > > > - if (!vhost_has_free_slot()) {
> > > > > > + if (!vhost_has_free_slot() && runstate_is_running()) {
> > > > > > error_setg(&local_err, "a used vhost backend has no free"
> > > > > > " memory slots left");
> > > > > > goto out;
> > > >
> > > > Even this produces the wrong error message in this case,
> > > > it also makes me think if the existing code should undo a lot of
> > > > the object_property_set's that happen.
> > > >
> > > > Dave
> > > > >
> > > > >
> > > > --
> > > > Dr. David Alan Gilbert / address@hidden / Manchester, UK
- [Qemu-devel] [PATCH] vhost: fix a migration failed because of vhost region merge, Peng Hao, 2017/07/19
- Re: [Qemu-devel] [PATCH] vhost: fix a migration failed because of vhost region merge, Igor Mammedov, 2017/07/19
- Re: [Qemu-devel] [PATCH] vhost: fix a migration failed because of vhost region merge, Dr. David Alan Gilbert, 2017/07/19
- Re: [Qemu-devel] [PATCH] vhost: fix a migration failed because of vhost region merge, Igor Mammedov, 2017/07/19
- Re: [Qemu-devel] [PATCH] vhost: fix a migration failed because of vhost region merge, Michael S. Tsirkin, 2017/07/19
- Re: [Qemu-devel] [PATCH] vhost: fix a migration failed because of vhost region merge, Dr. David Alan Gilbert, 2017/07/20
- Re: [Qemu-devel] [PATCH] vhost: fix a migration failed because of vhost region merge, Michael S. Tsirkin, 2017/07/21
- Re: [Qemu-devel] [PATCH] vhost: fix a migration failed because of vhost region merge, Dr. David Alan Gilbert, 2017/07/24
- Re: [Qemu-devel] [PATCH] vhost: fix a migration failed because of vhost region merge, Igor Mammedov, 2017/07/24
- Re: [Qemu-devel] [PATCH] vhost: fix a migration failed because of vhost region merge, Igor Mammedov, 2017/07/21
- Re: [Qemu-devel] [PATCH] vhost: fix a migration failed because of vhost region merge,
Michael S. Tsirkin <=
- Re: [Qemu-devel] [PATCH] vhost: fix a migration failed because of vhost region merge, Igor Mammedov, 2017/07/24
- Re: [Qemu-devel] [PATCH] vhost: fix a migration failed because of vhost region merge, Igor Mammedov, 2017/07/24
Re: [Qemu-devel] [PATCH] vhost: fix a migration failed because of vhost region merge, no-reply, 2017/07/19