Hi,
I've seen some strange time behavior in some of our VMs usually triggered by live migration. In some VMs we have seen some significant time drift which NTP was not able to correct after doing a live migration.
I've not been able so far to reproduce the same case, however, I did notice that live migration does introduce some increase in clock jitter values, and I am not sure if that might have anything to do with any significant time drift.
Here is an example of a CentOS 6 guest running under qemu 1.2 before doing a live migration:
address@hidden ~]# ntpq -pcrv
remote refid st t when poll reach delay offset jitter
==========================================================================
+helium.constant 18.26.4.105 2 u 65 64 377 60.539 -0.011 0.554
-209.118.204.201 128.9.176.30 2 u 47 64 377 15.750 -1.835 0.388
*time3.chpc.utah 198.60.22.240 2 u 46 64 377 30.585 3.934 0.253
+dns2.untangle.c 216.218.254.202 2 u 21 64 377 22.196 2.345 0.740
associd=0 status=0615 leap_none, sync_ntp, 1 event, clock_sync,
version="ntpd address@hidden Sat Dec 20 02:53:39 UTC 2014 (1)",
processor="x86_64", system="Linux/2.6.32-504.3.3.el6.x86_64", leap=00,
stratum=3, precision=-21, rootdelay=32.355, rootdisp=53.173,
refid=155.101.3.115,
reftime=d86264f3.444c75e7 Thu, Jan 15 2015 16:10:27.266,
clock=d86265ed.10a34c1c Thu, Jan 15 2015 16:14:37.064, peer=3418, tc=6,
mintc=3, offset=0.000, frequency=2.863, sys_jitter=2.024,
clk_jitter=2.283, clk_wander=0.000
address@hidden ~]# ntpdc -c kerninfo
pll offset: 0 s
pll frequency: 2.863 ppm
maximum error: 0.19838 s
estimated error: 0.002282 s
status: 2001 pll nano
pll time constant: 6
precision: 1e-09 s
frequency tolerance: 500 ppm
Immediately after live migration, you can see that there is an increase in jitter values:
address@hidden ~]# ntpq -pcrv
remote refid st t when poll reach delay offset jitter
==========================================================================
-helium.constant 18.26.4.105 2 u 59 64 377 60.556 -0.916 31.921
+209.118.204.201 128.9.176.30 2 u 38 64 377 15.717 28.879 12.220
+time3.chpc.utah 132.163.4.103 2 u 45 64 353 30.639 3.240 26.975
*dns2.untangle.c 216.218.254.202 2 u 17 64 377 22.248 33.039 11.791
associd=0 status=0615 leap_none, sync_ntp, 1 event, clock_sync,
version="ntpd address@hidden Sat Dec 20 02:53:39 UTC 2014 (1)",
processor="x86_64", system="Linux/2.6.32-504.3.3.el6.x86_64", leap=00,
stratum=3, precision=-21, rootdelay=25.086, rootdisp=83.736,
refid=74.123.29.4,
reftime=d8626838.47529689 Thu, Jan 15 2015 16:24:24.278,
clock=d8626849.4920018a Thu, Jan 15 2015 16:24:41.285, peer=3419, tc=6,
mintc=3, offset=24.118, frequency=11.560, sys_jitter=15.145,
clk_jitter=8.056, clk_wander=2.757
address@hidden ~]# ntpdc -c kerninfo
pll offset: 0.0211957 s
pll frequency: 11.560 ppm
maximum error: 0.112523 s
estimated error: 0.008055 s
status: 2001 pll nano
pll time constant: 6
precision: 1e-09 s
frequency tolerance: 500 ppm
The increase in the jitter and offset values is well within the 500 ppm frequency tolerance limit, and therefore are easily corrected by subsequent NTP clock sync events, but some live migrations do cause much higher jitter and offset jumps, which can not be corrected by NTP and cause the time to go way off. Any idea why this is the case?
Regards,
Mohammed