qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] vhost: fix a migration failed because of vhost


From: Igor Mammedov
Subject: Re: [Qemu-devel] [PATCH] vhost: fix a migration failed because of vhost region merge
Date: Mon, 24 Jul 2017 12:05:22 +0200

On Sat, 22 Jul 2017 00:30:17 +0300
"Michael S. Tsirkin" <address@hidden> wrote:

> On Fri, Jul 21, 2017 at 04:41:58PM +0200, Igor Mammedov wrote:
> > On Wed, 19 Jul 2017 18:52:56 +0300
> > "Michael S. Tsirkin" <address@hidden> wrote:
> >   
> > > On Wed, Jul 19, 2017 at 03:24:27PM +0200, Igor Mammedov wrote:  
> > > > On Wed, 19 Jul 2017 12:46:13 +0100
> > > > "Dr. David Alan Gilbert" <address@hidden> wrote:
> > > >     
> > > > > * Igor Mammedov (address@hidden) wrote:    
> > > > > > On Wed, 19 Jul 2017 23:17:32 +0800
> > > > > > Peng Hao <address@hidden> wrote:
> > > > > >       
> > > > > > > When a guest that has several hotplugged dimms is migrated, in
> > > > > > > destination host it will fail to resume. Because vhost regions of
> > > > > > > several dimms in source host are merged and in the restore stage
> > > > > > > in destination host it computes whether more than vhost slot limit
> > > > > > > before merging vhost regions of several dimms.      
> > > > > > could you provide a bit more detailed description of the problem
> > > > > > including command line+used device_add commands on source and
> > > > > > command line on destination?      
> > > > > 
> > > > > (ccing in Marc Andre and Maxime)
> > > > > 
> > > > > Hmm, I'd like to understade the situation where you get merging 
> > > > > between
> > > > > RAMBlocks; that complicates some stuff for postcopy.    
> > > > and probably inconsistent merging breaks vhost as well
> > > > 
> > > > merging might happen if regions are adjacent or overlap
> > > > but for that to happen merged regions must have equal
> > > > distance between their GPA:HVA pairs, so that following
> > > > translation would work:
> > > > 
> > > > if gva in regionX[gva_start, len, hva_start]
> > > >    hva = hva_start + gva - gva_start
> > > > 
> > > > while GVA of regions is under QEMU control and deterministic
> > > > HVA is not, so in migration case merging might happen on source
> > > > side but not on destination, resulting in different memory maps.
> > > > 
> > > > Maybe Michael might know details why migration works in vhost usecase,
> > > > but I don't see vhost sending any vmstate data.    
> > > 
> > > We aren't merging ramblocks at all.
> > > When we are passing blocks A and B to vhost, if we see that
> > > 
> > > hvaB=hvaA + lenA
> > > gpaB=gpaA + lenA
> > > 
> > > then we can improve performance a bit by passing a single
> > > chunk to vhost: hvaA,gpaA,lena+lenB  
> > kernel used to maintain flat array map for look up where
> > such optimization could give some benefit which is negligible
> > as in practice merging reduces array size only by ~5 entries.
> > 
> > In addition kernel backend has been converted to interval tree
> > as flat array doesn't scale, so merging doesn't really matters
> > there anymore.  
> 
> In my opinion not merging slots is an obvious waste - I
> think there were patches that added a cache and that
> showed some promise. cache will be more effective
> if regions are bigger.
if I recall correctly caching patches were there to alleviate
bad scaling of lookup in the flat array memory map, the later
were replaced in vhost kernel with interval tree.


> > If we can get rid of merging on QEMU side, resulting memory
> > map will become of the same size regardless of the order
> > in which entries are added or chancy random allocation
> > that could allow region merging (i.e. size will become
> > deterministic).  
> 
> It seems somehow wrong to avoid doing (even minor) optimizations just to
> make error handling simpler.
it's not question of simplifying error handling but
more like making behavior more consistent.

Backend can do compression on its side or even use
more suitable data structure then flat array
(like vhost kernel does).


> > Looking at vhost_user_set_mem_table() it sends actual number of
> > entries to backend over the wire, so it shouldn't break backend
> > if it were written right (i.e. uses msg.payload.memory.nregions
> > instead of VHOST_MEMORY_MAX_NREGIONS from QEMU.), if it breaks
> > then it's backend's fault and it should be fixed.
> > 
> > Another thing that could break is too low limit
> >  VHOST_MEMORY_MAX_NREGIONS = 8
> > and QEMU started with default options takes upto 7 entries in map
> > unmerged, so any configuration that consumes additional slots won't
> > start after upgrade. We could counter the most of issues by rising
> > VHOST_MEMORY_MAX_NREGIONS limit and/or teaching vhost-user protocol
> > to fetch limit from backend similar to vhost_kernel_memslots_limit().  
> 
> I absolutely agree we should fix vhost-user to raise the slot
> limit, along the lines you suggest. Care looking into it?
I'd leave fixing vhost-user to someone who actually works on it
or its maintainers.


> > > so it does not affect migration normally.
> > >   
> > > >     
> > > > >     
> > > > > > > 
> > > > > > > Signed-off-by: Peng Hao <address@hidden>
> > > > > > > Signed-off-by: Wang Yechao <address@hidden>
> > > > > > > ---
> > > > > > >  hw/mem/pc-dimm.c | 2 +-
> > > > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > 
> > > > > > > diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
> > > > > > > index ea67b46..bb0fa08 100644
> > > > > > > --- a/hw/mem/pc-dimm.c
> > > > > > > +++ b/hw/mem/pc-dimm.c
> > > > > > > @@ -101,7 +101,7 @@ void pc_dimm_memory_plug(DeviceState *dev, 
> > > > > > > MemoryHotplugState *hpms,
> > > > > > >          goto out;
> > > > > > >      }
> > > > > > >  
> > > > > > > -    if (!vhost_has_free_slot()) {
> > > > > > > +    if (!vhost_has_free_slot() && runstate_is_running()) {
> > > > > > >          error_setg(&local_err, "a used vhost backend has no free"
> > > > > > >                                 " memory slots left");
> > > > > > >          goto out;      
> > > > > 
> > > > > Even this produces the wrong error message in this case,
> > > > > it also makes me think if the existing code should undo a lot of
> > > > > the object_property_set's that happen.
> > > > > 
> > > > > Dave    
> > > > > > 
> > > > > >       
> > > > > --
> > > > > Dr. David Alan Gilbert / address@hidden / Manchester, UK    




reply via email to

[Prev in Thread] Current Thread [Next in Thread]