qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v1] docs/devel: Add VFIO device migration documentation


From: Alex Williamson
Subject: Re: [PATCH v1] docs/devel: Add VFIO device migration documentation
Date: Thu, 5 Nov 2020 12:11:50 -0700

On Fri, 6 Nov 2020 00:29:36 +0530
Kirti Wankhede <kwankhede@nvidia.com> wrote:

> On 11/4/2020 6:15 PM, Alex Williamson wrote:
> > On Wed, 4 Nov 2020 13:25:40 +0530
> > Kirti Wankhede <kwankhede@nvidia.com> wrote:
> >   
> >> On 11/4/2020 1:57 AM, Alex Williamson wrote:  
> >>> On Wed, 4 Nov 2020 01:18:12 +0530
> >>> Kirti Wankhede <kwankhede@nvidia.com> wrote:
> >>>      
> >>>> On 10/30/2020 12:35 AM, Alex Williamson wrote:  
> >>>>> On Thu, 29 Oct 2020 23:11:16 +0530
> >>>>> Kirti Wankhede <kwankhede@nvidia.com> wrote:
> >>>>>         
> >>>>
> >>>> <snip>
> >>>>     
> >>>>>>>> +System memory dirty pages tracking
> >>>>>>>> +----------------------------------
> >>>>>>>> +
> >>>>>>>> +A ``log_sync`` memory listener callback is added to mark system 
> >>>>>>>> memory pages  
> >>>>>>>
> >>>>>>> s/is added to mark/marks those/
> >>>>>>>            
> >>>>>>>> +as dirty which are used for DMA by VFIO device. Dirty pages bitmap 
> >>>>>>>> is queried  
> >>>>>>>
> >>>>>>> s/by/by the/
> >>>>>>> s/Dirty/The dirty/
> >>>>>>>            
> >>>>>>>> +per container. All pages pinned by vendor driver through 
> >>>>>>>> vfio_pin_pages()  
> >>>>>>>
> >>>>>>> s/by/by the/
> >>>>>>>            
> >>>>>>>> +external API have to be marked as dirty during migration. When 
> >>>>>>>> there are CPU
> >>>>>>>> +writes, CPU dirty page tracking can identify dirtied pages, but any 
> >>>>>>>> page pinned
> >>>>>>>> +by vendor driver can also be written by device. There is currently 
> >>>>>>>> no device  
> >>>>>>>
> >>>>>>> s/by/by the/ (x2)
> >>>>>>>            
> >>>>>>>> +which has hardware support for dirty page tracking. So all pages 
> >>>>>>>> which are
> >>>>>>>> +pinned by vendor driver are considered as dirty.
> >>>>>>>> +Dirty pages are tracked when device is in stop-and-copy phase 
> >>>>>>>> because if pages
> >>>>>>>> +are marked dirty during pre-copy phase and content is transfered 
> >>>>>>>> from source to
> >>>>>>>> +destination, there is no way to know newly dirtied pages from the 
> >>>>>>>> point they
> >>>>>>>> +were copied earlier until device stops. To avoid repeated copy of 
> >>>>>>>> same content,
> >>>>>>>> +pinned pages are marked dirty only during stop-and-copy phase.  
> >>>>>>
> >>>>>>        
> >>>>>>> Let me take a quick stab at rewriting this paragraph (not sure if I
> >>>>>>> understood it correctly):
> >>>>>>>
> >>>>>>> "Dirty pages are tracked when the device is in the stop-and-copy 
> >>>>>>> phase.
> >>>>>>> During the pre-copy phase, it is not possible to distinguish a dirty
> >>>>>>> page that has been transferred from the source to the destination from
> >>>>>>> newly dirtied pages, which would lead to repeated copying of the same
> >>>>>>> content. Therefore, pinned pages are only marked dirty during the
> >>>>>>> stop-and-copy phase." ?
> >>>>>>>            
> >>>>>>
> >>>>>> I think above rephrase only talks about repeated copying in pre-copy
> >>>>>> phase. Used "copied earlier until device stops" to indicate both
> >>>>>> pre-copy and stop-and-copy till device stops.  
> >>>>>
> >>>>>
> >>>>> Now I'm confused, I thought we had abandoned the idea that we can only
> >>>>> report pinned pages during stop-and-copy.  Doesn't the device needs to
> >>>>> expose its dirty memory footprint during the iterative phase regardless
> >>>>> of whether that causes repeat copies?  If QEMU iterates and sees that
> >>>>> all memory is still dirty, it may have transferred more data, but it
> >>>>> can actually predict if it can achieve its downtime tolerances.  Which
> >>>>> is more important, less data transfer or predictability?  Thanks,
> >>>>>         
> >>>>
> >>>> Even if QEMU copies and transfers content of all sys mem pages during
> >>>> pre-copy (worst case with IOMMU backed mdev device when its vendor
> >>>> driver is not smart to pin pages explicitly and all sys mem pages are
> >>>> marked dirty), then also its prediction about downtime tolerance will
> >>>> not be correct, because during stop-and-copy again all pages need to be
> >>>> copied as device can write to any of those pinned pages.  
> >>>
> >>> I think you're only reiterating my point.  If QEMU copies all of guest
> >>> memory during the iterative phase and each time it sees that all memory
> >>> is dirty, such as if CPUs or devices (including assigned devices) are
> >>> dirtying pages as fast as it copies them (or continuously marks them
> >>> dirty), then QEMU can predict that downtime will require copying all
> >>> pages.  
> >>
> >> But as of now there is no way to know if device has dirtied pages during
> >> iterative phase.  
> > 
> > 
> > This claim doesn't make any sense, pinned pages are considered
> > persistently dirtied, during the iterative phase and while stopped.
> > 
> >     
> >>> If instead devices don't mark dirty pages until the VM is
> >>> stopped, then QEMU might iterate through memory copy and predict a short
> >>> downtime because not much memory is dirty, only to be surprised that
> >>> all of memory is suddenly dirty.  At that point it's too late, the VM
> >>> is already stopped, the predicted short downtime takes far longer than
> >>> expected.  This is exactly why we made the kernel interface mark pinned
> >>> pages persistently dirty when it was proposed that we only report
> >>> pinned pages once.  Thanks,
> >>>      
> >>
> >> Since there is no way to know if device dirtied pages during iterative
> >> phase, QEMU should query pinned pages in stop-and-copy phase.  
> > 
> > 
> > As above, I don't believe this is true.
> > 
> >   
> >> Whenever there will be hardware support or some software mechanism to
> >> report pages dirtied by device then we will add a capability bit in
> >> migration capability and based on that capability bit qemu/user space
> >> app should decide to query dirty pages in iterative phase.  
> > 
> > 
> > Yes, we could advertise support for fine granularity dirty page
> > tracking, but I completely disagree that we should consider pinned
> > pages clean until suddenly exposing them as dirty once the VM is
> > stopped.  Thanks,
> >   
> 
> Should QEMU copy dirtied pages twice, during iterative phase and then 
> when VM is stopped?

I don't understand why this is controversial.  We cannot decide within
the vfio device to only expose device dirtied pages in the final stage
of migration.  It's not our job to minimize the number of pages copied
beyond the hardware granularity.  If core QEMU migration code asks for
dirty pages, we provide them, regardless of how many times we report a
page as dirty.  So yes, if that migration code asks for dirty pages in
the iterative stage and the stopped stage, we provide them both times.
If someone wants to skip the iterative phase altogether, I imagine
there are migration parameters that allow it, but we should not be
determining that policy at the device level.  Thanks,

Alex




reply via email to

[Prev in Thread] Current Thread [Next in Thread]