qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v1] docs/devel: Add VFIO device migration documentation


From: Alex Williamson
Subject: Re: [PATCH v1] docs/devel: Add VFIO device migration documentation
Date: Tue, 3 Nov 2020 13:27:58 -0700

On Wed, 4 Nov 2020 01:18:12 +0530
Kirti Wankhede <kwankhede@nvidia.com> wrote:

> On 10/30/2020 12:35 AM, Alex Williamson wrote:
> > On Thu, 29 Oct 2020 23:11:16 +0530
> > Kirti Wankhede <kwankhede@nvidia.com> wrote:
> >   
> 
> <snip>
> 
> >>>> +System memory dirty pages tracking
> >>>> +----------------------------------
> >>>> +
> >>>> +A ``log_sync`` memory listener callback is added to mark system memory 
> >>>> pages  
> >>>
> >>> s/is added to mark/marks those/
> >>>      
> >>>> +as dirty which are used for DMA by VFIO device. Dirty pages bitmap is 
> >>>> queried  
> >>>
> >>> s/by/by the/
> >>> s/Dirty/The dirty/
> >>>      
> >>>> +per container. All pages pinned by vendor driver through 
> >>>> vfio_pin_pages()  
> >>>
> >>> s/by/by the/
> >>>      
> >>>> +external API have to be marked as dirty during migration. When there 
> >>>> are CPU
> >>>> +writes, CPU dirty page tracking can identify dirtied pages, but any 
> >>>> page pinned
> >>>> +by vendor driver can also be written by device. There is currently no 
> >>>> device  
> >>>
> >>> s/by/by the/ (x2)
> >>>      
> >>>> +which has hardware support for dirty page tracking. So all pages which 
> >>>> are
> >>>> +pinned by vendor driver are considered as dirty.
> >>>> +Dirty pages are tracked when device is in stop-and-copy phase because 
> >>>> if pages
> >>>> +are marked dirty during pre-copy phase and content is transfered from 
> >>>> source to
> >>>> +destination, there is no way to know newly dirtied pages from the point 
> >>>> they
> >>>> +were copied earlier until device stops. To avoid repeated copy of same 
> >>>> content,
> >>>> +pinned pages are marked dirty only during stop-and-copy phase.  
> >>
> >>  
> >>> Let me take a quick stab at rewriting this paragraph (not sure if I
> >>> understood it correctly):
> >>>
> >>> "Dirty pages are tracked when the device is in the stop-and-copy phase.
> >>> During the pre-copy phase, it is not possible to distinguish a dirty
> >>> page that has been transferred from the source to the destination from
> >>> newly dirtied pages, which would lead to repeated copying of the same
> >>> content. Therefore, pinned pages are only marked dirty during the
> >>> stop-and-copy phase." ?
> >>>      
> >>
> >> I think above rephrase only talks about repeated copying in pre-copy
> >> phase. Used "copied earlier until device stops" to indicate both
> >> pre-copy and stop-and-copy till device stops.  
> > 
> > 
> > Now I'm confused, I thought we had abandoned the idea that we can only
> > report pinned pages during stop-and-copy.  Doesn't the device needs to
> > expose its dirty memory footprint during the iterative phase regardless
> > of whether that causes repeat copies?  If QEMU iterates and sees that
> > all memory is still dirty, it may have transferred more data, but it
> > can actually predict if it can achieve its downtime tolerances.  Which
> > is more important, less data transfer or predictability?  Thanks,
> >   
> 
> Even if QEMU copies and transfers content of all sys mem pages during 
> pre-copy (worst case with IOMMU backed mdev device when its vendor 
> driver is not smart to pin pages explicitly and all sys mem pages are 
> marked dirty), then also its prediction about downtime tolerance will 
> not be correct, because during stop-and-copy again all pages need to be 
> copied as device can write to any of those pinned pages.

I think you're only reiterating my point.  If QEMU copies all of guest
memory during the iterative phase and each time it sees that all memory
is dirty, such as if CPUs or devices (including assigned devices) are
dirtying pages as fast as it copies them (or continuously marks them
dirty), then QEMU can predict that downtime will require copying all
pages.  If instead devices don't mark dirty pages until the VM is
stopped, then QEMU might iterate through memory copy and predict a short
downtime because not much memory is dirty, only to be surprised that
all of memory is suddenly dirty.  At that point it's too late, the VM
is already stopped, the predicted short downtime takes far longer than
expected.  This is exactly why we made the kernel interface mark pinned
pages persistently dirty when it was proposed that we only report
pinned pages once.  Thanks,

Alex




reply via email to

[Prev in Thread] Current Thread [Next in Thread]