qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v16 QEMU 00/16] Add migration support for VFIO devices


From: Alex Williamson
Subject: Re: [PATCH v16 QEMU 00/16] Add migration support for VFIO devices
Date: Wed, 1 Apr 2020 12:34:49 -0600

On Wed, 1 Apr 2020 02:41:54 -0400
Yan Zhao <address@hidden> wrote:

> On Wed, Apr 01, 2020 at 02:34:24AM +0800, Alex Williamson wrote:
> > On Wed, 25 Mar 2020 02:38:58 +0530
> > Kirti Wankhede <address@hidden> wrote:
> >   
> > > Hi,
> > > 
> > > This Patch set adds migration support for VFIO devices in QEMU.  
> > 
> > Hi Kirti,
> > 
> > Do you have any migration data you can share to show that this solution
> > is viable and useful?  I was chatting with Dave Gilbert and there still
> > seems to be a concern that we actually have a real-world practical
> > solution.  We know this is inefficient with QEMU today, vendor pinned
> > memory will get copied multiple times if we're lucky.  If we're not
> > lucky we may be copying all of guest RAM repeatedly.  There are known
> > inefficiencies with vIOMMU, etc.  QEMU could learn new heuristics to
> > account for some of this and we could potentially report different
> > bitmaps in different phases through vfio, but let's make sure that
> > there are useful cases enabled by this first implementation.
> > 
> > With a reasonably sized VM, running a reasonable graphics demo or
> > workload, can we achieve reasonably live migration?  What kind of
> > downtime do we achieve and what is the working set size of the pinned
> > memory?  Intel folks, if you've been able to port to this or similar
> > code base, please report your results as well, open source consumers
> > are arguably even more important.  Thanks,
> >   
> hi Alex
> we're in the process of porting to this code, and now it's able to
> migrate successfully without dirty pages.
> 
> when there're dirty pages, we met several issues.
> one of them is reported here
> (https://lists.gnu.org/archive/html/qemu-devel/2020-04/msg00004.html).
> dirty pages for some regions are not able to be collected correctly,
> especially for memory range from 3G to 4G.
> 
> even without this bug, qemu still got stuck in middle before
> reaching stop-and-copy phase and cannot be killed by admin.
> still in debugging of this problem.

Thanks, Yan.  So it seems we have various bugs, known limitations, and
we haven't actually proven that this implementation provides a useful
feature, at least for the open source consumer.  This doesn't give me
much confidence to consider the kernel portion ready for v5.7 given how
late we are already :-\  Thanks,

Alex




reply via email to

[Prev in Thread] Current Thread [Next in Thread]