qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Fedora FC21 - Bug: 100% CPU and hangs in gettimeofday(&


From: Gerhard Wiesinger
Subject: Re: [Qemu-devel] Fedora FC21 - Bug: 100% CPU and hangs in gettimeofday(&tp, NULL); forever
Date: Tue, 03 Mar 2015 13:28:41 +0100
User-agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0

On 03.03.2015 10:12, Gerhard Wiesinger wrote:
On 02.03.2015 18:15, Gerhard Wiesinger wrote:
On 02.03.2015 16:52, Gerhard Wiesinger wrote:
On 02.03.2015 10:26, Paolo Bonzini wrote:

On 01/03/2015 11:36, Gerhard Wiesinger wrote:
So far it happened only the PostgreSQL database VM. Kernel is alive
(ping works well). ssh is not working.
console window: after entering one character at login prompt, then crashed: [1438.384864] Out of memory: Kill process 10115 (pg_dump) score 112 or
sacrifice child
[1438.384990] Killed process 10115 (pg_dump) total-vm: 340548kB,
anon-rss: 162712kB, file-rss: 220kB
Can you get a vmcore or at least sysrq-t output?

Yes, next time it happens I can analyze it.

I think there are 2 problems:
1.) OOM (Out of Memory) problem with the low memory settings and kernel settings (see below)
2.) Instability problem which might have a dependency to 1.)

What I've done so far (thanks to Andrey Korolyov for ideas and help):
a.) Updated maschine type from pc-0.15 to pc-i440fx-2.2
virsh dumpxml database | grep "<type"
    <type arch='x86_64' machine='pc-0.15'>hvm</type>

virsh edit database
virsh dumpxml database | grep "<type"
    <type arch='x86_64' machine='pc-i440fx-2.2'>hvm</type>

SMBIOS is updated therefore from 2.4 to 2.8:
dmesg|grep -i SMBIOS
[    0.000000] SMBIOS 2.8 present.
b.) Switched to tsc clock, kernel parameters: clocksource=tsc nohz=off highres=off
c.) Changed overcommit to 1
echo "vm.overcommit_memory = 1" > /etc/sysctl.d/overcommit.conf
d.) Tried 1 VCPU instead of 2
e.) Installed 512MB vRAM instead of 384MB
f.) Prepared for sysrq and vmcore
echo "kernel.sysrq = 1" > /etc/sysctl.d/sysrq.conf
sysctl -w kernel.sysrq=1
virsh send-key database KEY_LEFTALT KEY_SYSRQ KEY_T
virsh dump domain-name /tmp/dumpfile
g.) Further ideas, not yet done: disable memory balooning by blacklisting baloon driver or remove from virsh xml config

Summary:
1.) 512MB, tsc timer, 1VCPU, vm.overcommit_memory = 1: no OOM problem, no crash 2.) 512MB, kvm_clock, 2VCPU, vm.overcommit_memory = 1: no OOM problem, no crash

3.) 384MB, kvm_clock, 2VCPU, vm.overcommit_memory = 1: no OOM problem, no crash

3b.) Still happened again at the nightly backup with same configuration as in 3.) configuration 384MB, kvm_clock, 2VCPU, vm.overcommit_memory = 1, pc-i440fx-2.2: no OOM problem, ping ok, no reaction, BUT CRASHED again


3c.) configuration 384MB, kvm_clock, 2VCPU, vm.overcommit_memory = 1, pc-i440fx-2.2: OOM problem, no crash

postgres invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
Free swap  = 905924kB
Total swap = 1081340kB
Out of memory: Kill process 19312 (pg_dump) score 142 or sacrifice child
Killed process 19312 (pg_dump) total-vm:384516kB, anon-rss:119260kB, file-rss:0kB

An OOM should not occour:
https://www.kernel.org/doc/gorman/html/understand/understand016.html
Is there enough swap space left (nr_swap_pages > 0) ? If yes, not OOM

Why does an OOM condition occour? Looks like a bug in the kernel?
Any ideas?

Ciao,
Gerhard




reply via email to

[Prev in Thread] Current Thread [Next in Thread]