From: Xiao Guangrong <address@hidden>
Changelog in v2:
These changes are based on Paolo's suggestion:
1) rename the lockless multithreads model to threaded workqueue
2) hugely improve the internal design, that make all the request be
a large array, properly partition it, assign requests to threads
respectively and use bitmaps to sync up threads and the submitter,
after that ptr_ring and spinlock are dropped
3) introduce event wait for the submitter
These changes are based on Emilio's review:
4) make more detailed description for threaded workqueue
5) add a benchmark for threaded workqueue
The previous version can be found at
https://marc.info/?l=kvm&m=153968821910007&w=2
There's the simple performance measurement comparing these two versions,
the environment is the same as we listed in the previous version.
Use 8 threads to compress the data in the source QEMU
- with compress-wait-thread = off
total time busy-ratio
--------------------------------------------------
v1 125066 0.38
v2 120444 0.35
- with compress-wait-thread = on
total time busy-ratio
--------------------------------------------------
v1 164426 0
v2 142609 0
The v2 win slightly.
Xiao Guangrong (5):
bitops: introduce change_bit_atomic
util: introduce threaded workqueue
migration: use threaded workqueue for compression
migration: use threaded workqueue for decompression
tests: add threaded-workqueue-bench
include/qemu/bitops.h | 13 +
include/qemu/threaded-workqueue.h | 94 +++++++
migration/ram.c | 538 ++++++++++++++------------------------
tests/Makefile.include | 5 +-
tests/threaded-workqueue-bench.c | 256 ++++++++++++++++++
util/Makefile.objs | 1 +
util/threaded-workqueue.c | 466 +++++++++++++++++++++++++++++++++
7 files changed, 1030 insertions(+), 343 deletions(-)
create mode 100644 include/qemu/threaded-workqueue.h
create mode 100644 tests/threaded-workqueue-bench.c
create mode 100644 util/threaded-workqueue.c