qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v6 0/4] migration: UFFD write-tracking migration/snapshots


From: Peter Xu
Subject: Re: [PATCH v6 0/4] migration: UFFD write-tracking migration/snapshots
Date: Fri, 11 Dec 2020 10:09:40 -0500

On Fri, Dec 11, 2020 at 04:13:02PM +0300, Andrey Gruzdev wrote:
> I've also made wr-fault resolution latency measurements, for the case when 
> migration
> stream is dumped to a file in cached mode.. Should approximately match saving 
> to the
> file fd directly though I used 'migrate exec:<>' using a hand-written tool.
> 
> VM config is 6 vCPUs + 16GB RAM, qcow2 image on Seagate 7200.11 series 1.5TB 
> HDD,
> snapshot goes to the same disk. Guest is Windows 10.
> 
> The test scenario is playing full-HD youtube video in Firefox while saving 
> snapshot.
> 
> Latency measurement begin/end points are fs/userfaultfd.c:handle_userfault() 
> and
> mm/userfaultfd.c:mwriteprotect_range(), respectively. For any faulting page, 
> the
> oldest wr-fault timestamp is accounted.
> 
> The whole time to take snapshot was ~30secs, file size is around 3GB.
> So far seems to be not a very bad picture.. However 16-255msecs range is 
> worrying
> me a bit, seems it causes audio backend buffer underflows sometimes.
> 
> 
>      msecs               : count     distribution
>          0 -> 1          : 111755   |****************************************|
>          2 -> 3          : 52       |                                        |
>          4 -> 7          : 105      |                                        |
>          8 -> 15         : 428      |                                        |
>         16 -> 31         : 335      |                                        |
>         32 -> 63         : 4        |                                        |
>         64 -> 127        : 8        |                                        |
>        128 -> 255        : 5        |                                        |

Great test!  Thanks for sharing these information.

Yes it's good enough for a 1st version, so it's already better than
functionally work. :)

So did you try your last previous patch to see whether it could improve in some
way?  Again we can gradually optimize upon your current work.

Btw, you reminded me that why not we track all these from kernel? :) That's a
good idea.  So, how did you trace it yourself?  Something like below should
work with bpftrace, but I feel like you were done in some other way, so just
fyi:

        # cat latency.bpf
        kprobe:handle_userfault
        {
                @start[tid] = nsecs;
        }

        kretprobe:handle_userfault
        {
                if (@start[tid]) {
                        $delay = nsecs - @start[tid];
                        delete(@start[tid]);
                        @delay_us = hist($delay / 1000);
                }
        }
        # bpftrace latency.bpf

Tracing return of handle_userfault() could be more accurate in that it also
takes the latency between UFFDIO_WRITEPROTECT until vcpu got waked up again.
However it's inaccurate because after a recent change to this code path in
commit f9bf352224d7 ("userfaultfd: simplify fault handling", 2020-08-03)
handle_userfault() could return even before page fault resolved.  However it
should be good enough in most cases because even if it happens, it'll fault
into handle_userfault() again, then we just got one more count.

Thanks!

-- 
Peter Xu




reply via email to

[Prev in Thread] Current Thread [Next in Thread]