[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v3 0/2] virtio-scsi: Optimizing request allocati
From: |
Paolo Bonzini |
Subject: |
Re: [Qemu-devel] [PATCH v3 0/2] virtio-scsi: Optimizing request allocation |
Date: |
Tue, 16 Sep 2014 10:19:19 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.0 |
Il 16/09/2014 09:20, Fam Zheng ha scritto:
> v3: Small tweak on "cmd" in 1/2 and "sreq" in 2/2.
>
> Zeroing is relatively expensive since we have big request structures.
> VirtQueueElement (>48k!) and sense_buf (256 bytes) are two points to look at.
>
> This visibly reduces overhead of request handling when testing with the
> unmerged "null" driver and virtio-scsi dataplane. Before, the issue is very
> obvious with perf top:
>
> perf top -G -p `pidof qemu-system-x86_64`
> -----------------------------------------
> + 16.50% libc-2.17.so [.] __memset_sse2
> + 2.28% libc-2.17.so [.] _int_malloc
> + 2.25% [vdso] [.] 0x0000000000000cd1
> + 2.02% [kernel] [k] _raw_spin_lock_irqsave
> + 1.97% libpthread-2.17.so [.] pthread_mutex_lock
> + 1.87% libpthread-2.17.so [.] pthread_mutex_unlock
> + 1.81% [kernel] [k] fget_light
> + 1.70% libc-2.17.so [.] malloc
>
> After, the high __memset_sse2 and _int_malloc is gone:
>
> perf top -G -p `pidof qemu-system-x86_64`
> -----------------------------------------
> + 4.20% [kernel] [k] vcpu_enter_guest
> + 3.97% [kernel] [k] vmx_vcpu_run
> + 2.63% [kernel] [k] _raw_spin_lock_irqsave
> + 1.72% [kernel] [k] native_read_msr_safe
> + 1.65% [kernel] [k] __srcu_read_lock
> + 1.64% [kernel] [k] _raw_spin_unlock_irqrestore
> + 1.57% [vdso] [.] 0x00000000000008d8
> + 1.49% libc-2.17.so [.] _int_malloc
> + 1.29% libpthread-2.17.so [.] pthread_mutex_unlock
> + 1.26% [kernel] [k] native_write_msr_safe
>
> See the commit message of patch 2 for some fio test data.
Thanks, applied to scsi-next.
Paolo