qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] vhost: fix a migration failed because of vhost


From: Igor Mammedov
Subject: Re: [Qemu-devel] [PATCH] vhost: fix a migration failed because of vhost region merge
Date: Fri, 21 Jul 2017 16:41:58 +0200

On Wed, 19 Jul 2017 18:52:56 +0300
"Michael S. Tsirkin" <address@hidden> wrote:

> On Wed, Jul 19, 2017 at 03:24:27PM +0200, Igor Mammedov wrote:
> > On Wed, 19 Jul 2017 12:46:13 +0100
> > "Dr. David Alan Gilbert" <address@hidden> wrote:
> >   
> > > * Igor Mammedov (address@hidden) wrote:  
> > > > On Wed, 19 Jul 2017 23:17:32 +0800
> > > > Peng Hao <address@hidden> wrote:
> > > >     
> > > > > When a guest that has several hotplugged dimms is migrated, in
> > > > > destination host it will fail to resume. Because vhost regions of
> > > > > several dimms in source host are merged and in the restore stage
> > > > > in destination host it computes whether more than vhost slot limit
> > > > > before merging vhost regions of several dimms.    
> > > > could you provide a bit more detailed description of the problem
> > > > including command line+used device_add commands on source and
> > > > command line on destination?    
> > > 
> > > (ccing in Marc Andre and Maxime)
> > > 
> > > Hmm, I'd like to understade the situation where you get merging between
> > > RAMBlocks; that complicates some stuff for postcopy.  
> > and probably inconsistent merging breaks vhost as well
> > 
> > merging might happen if regions are adjacent or overlap
> > but for that to happen merged regions must have equal
> > distance between their GPA:HVA pairs, so that following
> > translation would work:
> > 
> > if gva in regionX[gva_start, len, hva_start]
> >    hva = hva_start + gva - gva_start
> > 
> > while GVA of regions is under QEMU control and deterministic
> > HVA is not, so in migration case merging might happen on source
> > side but not on destination, resulting in different memory maps.
> > 
> > Maybe Michael might know details why migration works in vhost usecase,
> > but I don't see vhost sending any vmstate data.  
> 
> We aren't merging ramblocks at all.
> When we are passing blocks A and B to vhost, if we see that
> 
> hvaB=hvaA + lenA
> gpaB=gpaA + lenA
> 
> then we can improve performance a bit by passing a single
> chunk to vhost: hvaA,gpaA,lena+lenB
kernel used to maintain flat array map for look up where
such optimization could give some benefit which is negligible
as in practice merging reduces array size only by ~5 entries.

In addition kernel backend has been converted to interval tree
as flat array doesn't scale, so merging doesn't really matters
there anymore.

If we can get rid of merging on QEMU side, resulting memory
map will become of the same size regardless of the order
in which entries are added or chancy random allocation
that could allow region merging (i.e. size will become
deterministic).

Looking at vhost_user_set_mem_table() it sends actual number of
entries to backend over the wire, so it shouldn't break backend
if it were written right (i.e. uses msg.payload.memory.nregions
instead of VHOST_MEMORY_MAX_NREGIONS from QEMU.), if it breaks
then it's backend's fault and it should be fixed.

Another thing that could break is too low limit
 VHOST_MEMORY_MAX_NREGIONS = 8
and QEMU started with default options takes upto 7 entries in map
unmerged, so any configuration that consumes additional slots won't
start after upgrade. We could counter the most of issues by rising
VHOST_MEMORY_MAX_NREGIONS limit and/or teaching vhost-user protocol
to fetch limit from backend similar to vhost_kernel_memslots_limit().


> so it does not affect migration normally.
> 
> >   
> > >   
> > > > > 
> > > > > Signed-off-by: Peng Hao <address@hidden>
> > > > > Signed-off-by: Wang Yechao <address@hidden>
> > > > > ---
> > > > >  hw/mem/pc-dimm.c | 2 +-
> > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
> > > > > index ea67b46..bb0fa08 100644
> > > > > --- a/hw/mem/pc-dimm.c
> > > > > +++ b/hw/mem/pc-dimm.c
> > > > > @@ -101,7 +101,7 @@ void pc_dimm_memory_plug(DeviceState *dev, 
> > > > > MemoryHotplugState *hpms,
> > > > >          goto out;
> > > > >      }
> > > > >  
> > > > > -    if (!vhost_has_free_slot()) {
> > > > > +    if (!vhost_has_free_slot() && runstate_is_running()) {
> > > > >          error_setg(&local_err, "a used vhost backend has no free"
> > > > >                                 " memory slots left");
> > > > >          goto out;    
> > > 
> > > Even this produces the wrong error message in this case,
> > > it also makes me think if the existing code should undo a lot of
> > > the object_property_set's that happen.
> > > 
> > > Dave  
> > > > 
> > > >     
> > > --
> > > Dr. David Alan Gilbert / address@hidden / Manchester, UK  




reply via email to

[Prev in Thread] Current Thread [Next in Thread]