[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG
From: |
Alex Bennée |
Subject: |
Re: [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG |
Date: |
Thu, 11 Aug 2016 18:22:23 +0100 |
User-agent: |
mu4e 0.9.17; emacs 25.1.4 |
Alex Bennée <address@hidden> writes:
> This is the fourth iteration of the RFC patch set which aims to
> provide the basic framework for MTTCG. I hope this will provide a good
> base for discussion at KVM Forum later this month.
>
<snip>
>
> In practice the memory barrier problems don't show up with an x86
> host. In fact I have created a tree which merges in the Emilio's
> cmpxchg atomics which happily boots ARMv7 Debian systems without any
> additional changes. You can find that at:
>
>
> https://github.com/stsquad/qemu/tree/mttcg/base-patches-v4-with-cmpxchg-atomics-v2
>
<snip>
> Performance
> ===========
>
> You can't do full work-load testing on this tree due to the lack of
> atomic support (but I will run some numbers on
> mttcg/base-patches-v4-with-cmpxchg-atomics-v2).
So here is a more real world work load run:
retry.py called with
['/home/alex/lsrc/qemu/qemu.git/arm-softmmu/qemu-system-arm', '-machine',
'type=virt', '-display', 'none', '-smp', '1', '-m', '4096', '-cpu',
'cortex-a15', '-serial', 'telnet:127.0.0.1:4444', '-monitor', 'stdio',
'-netdev', 'user,id=unet,hostfwd=tcp::2222-:22', '-device',
'virtio-net-device,netdev=unet', '-drive',
'file=/home/alex/lsrc/qemu/images/jessie-arm32.qcow2,id=myblock,index=0,if=none',
'-device', 'virtio-blk-device,drive=myblock', '-append', 'console=ttyAMA0
systemd.unit=benchmark-build.service root=/dev/vda1', '-kernel',
'/home/alex/lsrc/qemu/images/aarch32-current-linux-kernel-only.img', '-smp',
'4', '-name', 'debug-threads=on', '-accel', 'tcg,thread=single']
run 1: ret=0 (PASS), time=261.794911 (1/1)
run 2: ret=0 (PASS), time=257.290045 (2/2)
run 3: ret=0 (PASS), time=256.536991 (3/3)
run 4: ret=0 (PASS), time=254.036260 (4/4)
run 5: ret=0 (PASS), time=256.539165 (5/5)
Results summary:
0: 5 times (100.00%), avg time 257.239 (8.00 varience/2.83 deviation)
Ran command 5 times, 5 passes
retry.py called with
['/home/alex/lsrc/qemu/qemu.git/arm-softmmu/qemu-system-arm', '-machine',
'type=virt', '-display', 'none', '-smp', '1', '-m', '4096', '-cpu',
'cortex-a15', '-serial', 'telnet:127.0.0.1:4444', '-monitor', 'stdio',
'-netdev', 'user,id=unet,hostfwd=tcp::2222-:22', '-device',
'virtio-net-device,netdev=unet', '-drive',
'file=/home/alex/lsrc/qemu/images/jessie-arm32.qcow2,id=myblock,index=0,if=none',
'-device', 'virtio-blk-device,drive=myblock', '-append', 'console=ttyAMA0
systemd.unit=benchmark-build.service root=/dev/vda1', '-kernel',
'/home/alex/lsrc/qemu/images/aarch32-current-linux-kernel-only.img', '-smp',
'4', '-name', 'debug-threads=on', '-accel', 'tcg,thread=multi']
run 1: ret=0 (PASS), time=86.597459 (1/1)
run 2: ret=0 (PASS), time=82.843904 (2/2)
run 3: ret=0 (PASS), time=84.095910 (3/3)
run 4: ret=0 (PASS), time=83.844595 (4/4)
run 5: ret=0 (PASS), time=83.594768 (5/5)
Results summary:
0: 5 times (100.00%), avg time 84.195 (2.02 varience/1.42 deviation)
Ran command 5 times, 5 passes
This shows a 30% overhead over the ideal for running multi-threaded but
still seeing a decent improvement in wall time.
So the test itself is booting the system, running the
benchmark-build.service:
# A benchmark target
#
# This shutsdown once the boot has completed
[Unit]
Description=Default
Requires=basic.target
After=basic.target
AllowIsolate=yes
[Service]
Type=oneshot
ExecStart=/root/mysrc/testcases.git/build-dir.sh
/root/src/stress-ng.git/
ExecStartPost=/sbin/poweroff
[Install]
WantedBy=multi-user.target
And the build-dir script is a simple:
#!/bin/sh
#
NR_CPUS=$(grep -c ^processor /proc/cpuinfo)
set -e
cd $1
make clean
make -j${NR_CPUS}
cd -
Measuring this over increasing -smp
| -smp | time | time as bar | theoretical | % of -smp 1 |
|------+---------+--------------+-------------+-------------|
| 1 | 238.184 | WWWWWWWWWWWW | 238.184 | |
| 2 | 133.402 | WWWWWWh | 119.092 | |
| 3 | 99.531 | WWWWH | 79.394667 | |
| 4 | 82.760 | WWWW: | 59.546 | |
#+TBLFM: $3='(orgtbl-ascii-draw $2 0 238.184 12)::address@hidden/$1
--
Alex Bennée
- [Qemu-devel] [RFC v4 16/28] tcg: drop global lock during TCG code execution, (continued)
- [Qemu-devel] [RFC v4 16/28] tcg: drop global lock during TCG code execution, Alex Bennée, 2016/08/11
- [Qemu-devel] [RFC v4 21/28] tcg: enable tb_lock() for SoftMMU, Alex Bennée, 2016/08/11
- [Qemu-devel] [RFC v4 22/28] tcg: enable thread-per-vCPU, Alex Bennée, 2016/08/11
- [Qemu-devel] [RFC v4 25/28] cputlb: introduce tlb_flush_* async work., Alex Bennée, 2016/08/11
- [Qemu-devel] [RFC v4 23/28] atomic: introduce cmpxchg_bool, Alex Bennée, 2016/08/11
- [Qemu-devel] [RFC v4 24/28] cputlb: add assert_cpu_is_self checks, Alex Bennée, 2016/08/11
- [Qemu-devel] [RFC v4 26/28] cputlb: tweak qemu_ram_addr_from_host_nofail reporting, Alex Bennée, 2016/08/11
- [Qemu-devel] [RFC v4 20/28] cpus: tweak sleeping and safe_work rules for MTTCG, Alex Bennée, 2016/08/11
- [Qemu-devel] [RFC v4 28/28] cputlb: make tlb_flush_by_mmuidx safe for MTTCG, Alex Bennée, 2016/08/11
- [Qemu-devel] [RFC v4 27/28] cputlb: make tlb_reset_dirty safe for MTTCG, Alex Bennée, 2016/08/11
- Re: [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG,
Alex Bennée <=
- Re: [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG, G 3, 2016/08/11