No. I don't need realtime behavior. Realtime implies determinism, but determinism doesn't implies realtime. Of course, I realize that there are other sources of non-determinism exist, but these are separate stories. Here I just trying to eliminate one of them - asynchronous emulation of I/O inside qemu. Realtime isn't solution here.
Firstly, implementing realtime still leaves dependency on host machine (its performance, hardware configuration, etc.) and number of containers running. Yes, it will be deterministic, but results are tied to given host and containers count.
Secondly, it's just an overkill for problem being solved. The problem area is bounded by guest and QEMU implementation. Using realtime requires to fight complexities on host also (host kernel must be realtime, system configuration must be tuned, all possible latencies must be carefully traced, etc.). I perfectly understand how complex to design realtime system in generic, and implementing it using linux makes things even more complex.
Thirdly, it works only for KVM (and possibly other virtualization hypervisors). It's not my case, since my guest running with TCG and -icount,sleep=off.
It seems you got me wrong. I'll try to explain problem in other way.
Guest virtual clock must run independent of realtime (host) clock. They might be synchronized only in order to wait for some QEMU/host operation to be completed, i.e. guest time is being frozen by host performance bottlenecks, but it's transparent for guest. This is how works (or, at least, should work) "-icount,sleep=off" in time domain of CPU emulation. But I/O operations are seems to not respect this "policy". When QEMU processes I/O request from guest, it allows virtual time to run freely until backend completes operation and result passed back to guest. And this is what makes guest to "feel" speed/latency of I/O. It's the core of the problem.
To explain problem even better I've written a simple script (test_run_multiple_containers.sh), which emulates execution of multiple containers:
#!/bin/bash
N=$1
for i in $(seq 1 $N);
do
dd if=/dev/zero of=/tmp/testfile_$i bs=1K count=100000 2>&1 | sed -n 's/^.*, \(.*\)$/\1/p' &
done
wait
rm -f /tmp/testfile*
Where N is a number of containers running in parallel, and /tmp/testfile_$i is a file located in $i container's rootfs (dedicated mount point, blk device or something else).
Running
./test_run_multiple_containers.sh 1
on real machine should output value, which corresponds to maximum write speed. Lets define it as "max_io_throughput".
Running this script on real machine with different N values should give ouptuts with roughly identical values like "max_io_throughput / N".
What I need is that running this script on guest should always give identical and constant values, not depending on N value, current host load or something else external to guest. (No magic. While running emulation will cause at most "max_io_throughput" load on host (in terms of real time), QEMU will throttle guest virtual clock to be N times slower relative to realtime clock.)
Also I forgot to mention that container's rootfs aren't required to be persistent and stay on host during execution of containers. They may be transferred to guest RAM before execution. They're just source images of rootfs.