[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] [KQEMU] [RFC] "rdtsc" usage on SMP hosts

From: andrzej zaborowski
Subject: [Qemu-devel] [KQEMU] [RFC] "rdtsc" usage on SMP hosts
Date: Tue, 27 Mar 2007 00:46:28 +0200


 kqemu doesn't trap the "rdtsc" instruction for performance reasons.
This is mostly okay on a uniprocessor host, but on a dual core CPU
there are effectively two TSCs and there's no warranty that they are
in sync. On my Linux desktop there happens to be about 17 seconds
difference between them, with a 14 days uptime and cpufreq not
compiled in. If the qemu guest is a linux using TSC for the
clocksource (default in some configurations) it turns out to be fatal
as the kernel believes it has only one processor and the TSC skews 17
seconds forward and back when qemu is migrated between the processors
on host, resulting in guest lock-up. In fact Linux locks up whenever
the TSC increment is negative a single time. The lock-up will occur
after a random time from boot up.

I'm not sure what would be the best resolution. Here are two ideas,
both have their downsides but they work:

Avoid using "rdtsc" on host. This requires a change in kqemu to trap
"rdtsc" (probably can be done in a smarter way so that full fall back
to qemu is not necesary):
--- a/common/interp.c
+++ b/common/interp.c
@@ -4641,17 +4641,7 @@ QO(                 case OT_LONG | 8:\
        LABEL(90) /* nop */
            goto insn_next;
        LABEL(131) /* rdtsc */
-            {
-                uint32_t low, high;
-                if ((s->cpu_state.cr4 & CR4_TSD_MASK) &&
-                    s->cpu_state.cpl != 0) {
-                    raise_exception_err(s, EXCP0D_GPF, 0);
-                }
-                asm volatile("rdtsc" : "=a" (low), "=d" (high));
-                s->regs1.eax = low;
-                s->regs1.edx = high;
-            }
-            goto insn_next;
+            raise_exception(s, KQEMU_RET_SOFTMMU);

        LABEL(105) /* syscall */

and a change in qemu:
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -61,17 +61,7 @@ static void ioportF0_write(void *opaque, uint32_t addr, uint3
2_t data)
/* TSC handling */
uint64_t cpu_get_tsc(CPUX86State *env)
-    /* Note: when using kqemu, it is more logical to return the host TSC
-       because kqemu does not trap the RDTSC instruction for
-       performance reasons */
-    if (env->kqemu_enabled) {
-        return cpu_get_real_ticks();
-    } else
-    {
-        return cpu_get_ticks();
-    }
+    return cpu_get_ticks();

/* SMM support */

The downside here is the performance penalty. I haven't done any
benchmarks but during my tests with Linux guest almost all TSC reads
happened in qemu, rather than in kqemu so the overhead shouldn't be

Second idea is to prevent qemu migration between processors by setting
the affinity (as it is already done for MsWindows hosts):
--- a/vl.c
+++ b/vl.c
@@ -49,6 +49,7 @@
#ifndef __sun__
+#define _Linux
#include <linux/if.h>
#include <linux/if_tun.h>
#include <pty.h>
@@ -56,6 +57,7 @@
#include <linux/rtc.h>
#include <linux/ppdev.h>
#include <linux/parport.h>
+#include <sched.h>
#include <sys/stat.h>
#include <sys/ethernet.h>
@@ -6884,11 +6886,25 @@ int main(int argc, char **argv)
    LIST_INIT (&vm_change_state_head);
#ifndef _WIN32
+#if defined(_Linux) && defined(USE_KQEMU)
+        cpu_set_t mask;
+        int i;
        struct sigaction act;
        act.sa_flags = 0;
        act.sa_handler = SIG_IGN;
        sigaction(SIGPIPE, &act, NULL);
+#if defined(_Linux) && defined(USE_KQEMU)
+        /* Force QEMU to run on a single CPU so that we can expect
+         * consistent values from "rdtsc" */
+        if (sched_getaffinity(0, sizeof(cpu_set_t), &mask) == 0) {
+            for (i = 0; !CPU_ISSET(i, &mask); i ++);
+            CPU_ZERO(&mask);
+            CPU_SET(i, &mask);
+            sched_setaffinity(0, sizeof(cpu_set_t), &mask);
+        }
    SetConsoleCtrlHandler(qemu_ctrl_handler, TRUE);

This part can be moved to somewhere after kqemu is enabled so that it
can be made conditional.

This approach is Linux specific, and it forces all qemu instances to
run on a single processor, so this can have an even bigger performance
hit (imagine 8 qemu sessions on an 8 CPU host). It also doesn't avoid
the use of "rdtsc" so the virtual TSC runs even when the emulator is
stopped, and there's no way to implement writing to the TSC with

I don't know which of these workarounds is more appropriate.

Anthony Liguori had an idea to use the AMD "rdtscp" instruction which
in addition returns the cpu number, and maintain a list of TSC offsets
for each host CPU to compensate for the differences between TSCs.

I need to always use one of the two workarounds when booting the
Xenoppix (vmknoppix) live-cd in qemu with kqemu enabled or I get a
lock-up after a random period fro bootup. Thanks to #qemu channel for
helping debugging this.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]