qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v11 5/5] migration: introduce 'userfaultfd-wrlat.py' script


From: Andrey Gruzdev
Subject: Re: [PATCH v11 5/5] migration: introduce 'userfaultfd-wrlat.py' script
Date: Thu, 21 Jan 2021 16:12:23 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0

On 20.01.2021 00:01, Peter Xu wrote:
On Wed, Jan 06, 2021 at 06:21:20PM +0300, Andrey Gruzdev wrote:
Add BCC/eBPF script to analyze userfaultfd write fault latency distribution.

Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com>
Acked-by: Peter Xu <peterx@redhat.com>
(This seems to be the last patch that lacks a r-b ... Let's see whether I could
 convert my a-b into an r-b... :)

+BPF_HASH(ev_start, struct ev_desc, u64);
+BPF_HASH(ctx_handle_userfault, u64, u64);
IMHO we only need one hash here instead of two:

  BPF_HASH(ev_start, u32, u64);

Where we use the tid as the key (u32), and timestamp as the value (u64).  The
thing is we don't really need the address for current statistics, IMHO.
Agree, that's a more appropriate way do that. Address here is not really needed.
Thanks!

      
+/* KPROBE for handle_userfault(). */
+int probe_handle_userfault(struct pt_regs *ctx, struct vm_fault *vmf,
+        unsigned long reason)
+{
+    /* Trace only UFFD write faults. */
+    if (reason & VM_UFFD_WP) {
Better with comment:

           /* Using "(u32)" to drop group ID which is upper 32 bits */
Yep.

If even better, we'd want a get_current_tid() helper and call it here and below
(bpf_get_current_pid_tgid() will return tid|gid<<32 I think, so I'm a bit
confused why bcc people called it pid at the first place...).

+        u64 pid = (u32) bpf_get_current_pid_tgid();
+        u64 addr = vmf->address;
+
+        do_event_start(pid, addr);
+        ctx_handle_userfault.update(&pid, &addr);
+    }
+    return 0;
+}
+
+/* KRETPROBE for handle_userfault(). */
+int retprobe_handle_userfault(struct pt_regs *ctx)
+{
+    u64 pid = (u32) bpf_get_current_pid_tgid();
+    u64 *addr_p;
+
+    /*
+     * Here we just ignore the return value. In case of spurious wakeup
+     * or pending signal we'll still get (at least for v5.8.0 kernel)
+     * VM_FAULT_RETRY or (VM_FAULT_RETRY | VM_FAULT_MAJOR) here.
+     * Anyhow, handle_userfault() would be re-entered if such case happens,
+     * keeping initial timestamp unchanged for the faulting thread.
AFAIU this comment is not matching what the code does.  But I agree it's not a
big problem because we won't miss any long delays (because the one long delayed
sample will just be split into two or multiple delays, which will still be
reflected in the histogram at last).  Or am I wrong?
Mm, not really sure about comment.. I need to read kernel code again.

      
+     */
+    addr_p = ctx_handle_userfault.lookup(&pid);
+    if (addr_p) {
+        do_event_end(pid, *addr_p);
+        ctx_handle_userfault.delete(&pid);
+    }
+    return 0;
+}
+"""
Other than that, the rest looks good to me.

I'd think it's fine to even merge the current version since it actually works
nicely.  Andrey, if you agree with any of my above comments, feel free to
repost this patch (since I see Dave provided the rest r-bs).  Then I think I
can r-b this one too.  Thanks!

Thanks!
-- 
Andrey Gruzdev, Principal Engineer
Virtuozzo GmbH  +7-903-247-6397
                virtuzzo.com

reply via email to

[Prev in Thread] Current Thread [Next in Thread]