qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Fedora FC21 - Bug: 100% CPU and hangs in gettimeofday(&


From: Gerhard Wiesinger
Subject: Re: [Qemu-devel] Fedora FC21 - Bug: 100% CPU and hangs in gettimeofday(&tp, NULL); forever
Date: Tue, 03 Mar 2015 14:18:52 +0100
User-agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0

On 03.03.2015 13:28, Gerhard Wiesinger wrote:
On 03.03.2015 10:12, Gerhard Wiesinger wrote:
On 02.03.2015 18:15, Gerhard Wiesinger wrote:
On 02.03.2015 16:52, Gerhard Wiesinger wrote:
On 02.03.2015 10:26, Paolo Bonzini wrote:

On 01/03/2015 11:36, Gerhard Wiesinger wrote:
So far it happened only the PostgreSQL database VM. Kernel is alive
(ping works well). ssh is not working.
console window: after entering one character at login prompt, then crashed: [1438.384864] Out of memory: Kill process 10115 (pg_dump) score 112 or
sacrifice child
[1438.384990] Killed process 10115 (pg_dump) total-vm: 340548kB,
anon-rss: 162712kB, file-rss: 220kB
Can you get a vmcore or at least sysrq-t output?

Yes, next time it happens I can analyze it.

I think there are 2 problems:
1.) OOM (Out of Memory) problem with the low memory settings and kernel settings (see below)
2.) Instability problem which might have a dependency to 1.)

What I've done so far (thanks to Andrey Korolyov for ideas and help):
a.) Updated maschine type from pc-0.15 to pc-i440fx-2.2
virsh dumpxml database | grep "<type"
    <type arch='x86_64' machine='pc-0.15'>hvm</type>

virsh edit database
virsh dumpxml database | grep "<type"
    <type arch='x86_64' machine='pc-i440fx-2.2'>hvm</type>

SMBIOS is updated therefore from 2.4 to 2.8:
dmesg|grep -i SMBIOS
[    0.000000] SMBIOS 2.8 present.
b.) Switched to tsc clock, kernel parameters: clocksource=tsc nohz=off highres=off
c.) Changed overcommit to 1
echo "vm.overcommit_memory = 1" > /etc/sysctl.d/overcommit.conf
d.) Tried 1 VCPU instead of 2
e.) Installed 512MB vRAM instead of 384MB
f.) Prepared for sysrq and vmcore
echo "kernel.sysrq = 1" > /etc/sysctl.d/sysrq.conf
sysctl -w kernel.sysrq=1
virsh send-key database KEY_LEFTALT KEY_SYSRQ KEY_T
virsh dump domain-name /tmp/dumpfile
g.) Further ideas, not yet done: disable memory balooning by blacklisting baloon driver or remove from virsh xml config

Summary:
1.) 512MB, tsc timer, 1VCPU, vm.overcommit_memory = 1: no OOM problem, no crash 2.) 512MB, kvm_clock, 2VCPU, vm.overcommit_memory = 1: no OOM problem, no crash

3.) 384MB, kvm_clock, 2VCPU, vm.overcommit_memory = 1: no OOM problem, no crash

3b.) Still happened again at the nightly backup with same configuration as in 3.) configuration 384MB, kvm_clock, 2VCPU, vm.overcommit_memory = 1, pc-i440fx-2.2: no OOM problem, ping ok, no reaction, BUT CRASHED again


3c.) configuration 384MB, kvm_clock, 2VCPU, vm.overcommit_memory = 1, pc-i440fx-2.2: OOM problem, no crash

postgres invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
Free swap  = 905924kB
Total swap = 1081340kB
Out of memory: Kill process 19312 (pg_dump) score 142 or sacrifice child
Killed process 19312 (pg_dump) total-vm:384516kB, anon-rss:119260kB, file-rss:0kB

An OOM should not occour:
https://www.kernel.org/doc/gorman/html/understand/understand016.html
Is there enough swap space left (nr_swap_pages > 0) ? If yes, not OOM

Why does an OOM condition occour? Looks like a bug in the kernel?
Any ideas?

# Allocating 800MB, killed by OOM killer
./mallocsleep 805306368
Killed

Out of memory: Kill process 27160 (mallocsleep) score 525 or sacrifice child
Killed process 27160 (mallocsleep) total-vm:790588kB, anon-rss:214948kB, file-rss:0kB

free -m
total used free shared buff/cache available
Mem:            363          23         252          23 87         295
Swap:          1055         134         921

ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 1392
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 1392
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited


# Maschine is getting inresponsive and stalls for seconds, but never reaches more than 1055MB swap size (+ 384MB RAM)
vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 0 0 136472 241196 1400 98544 4 57 1724 67 211 261 2 3 91 2 2 0 0 136472 241228 1400 98540 0 0 0 0 30 48 0 0 100 0 0 0 0 136472 241228 1408 98532 0 0 0 52 53 51 0 0 89 11 0 0 0 136472 241224 1408 98540 0 0 0 112 44 92 0 0 100 0 0 0 0 136472 241224 1408 98540 0 0 0 0 24 32 0 0 100 0 0 0 0 136472 241352 1408 98540 0 0 0 0 31 44 0 1 100 0 0 0 0 136472 241328 1408 98540 0 0 0 36 97 142 0 1 99 0 0 0 0 136472 241364 1408 98540 0 0 0 0 22 30 0 0 100 0 0 0 0 136472 241376 1416 98532 0 0 0 80 52 45 0 0 92 8 1 1 0 136472 9236 1416 98548 0 0 8 0 762 55 11 23 66 0 0 2 7 270496 3804 140 61172 1144 412268 15028 412340 92805 301836 1 49 1 27 22 1 12 620320 4788 140 35240 1240 114864 96860 114976 46242 96395 1 26 0 61 12 3 18 661436 4788 144 35568 508 0 167884 0 5605 8097 5 76 0 16 4 3 4 661220 4288 144 34256 252 0 273684 0 7454 9777 3 71 0 19 7 5 20 661024 4532 144 34772 320 0 238288 0 9452 12395 3 78 0 13 6 6 19 660596 4592 144 35884 320 0 233160 8 12401 16798 5 67 0 12 15 3 20 677268 4296 140 36816 2180 18200 444328 18332 19382 36234 8 67 0 11 14 3 25 677208 4792 136 36044 68 0 524340 12 20637 26558 3 74 0 15 8 2 21 687880 4964 136 38200 260 10784 311152 10884 17707 28941 4 78 0 12 5 3 21 693808 4380 176 36860 136 6024 388932 6096 14576 22372 3 84 0 6 7 3 27 693740 4432 152 38288 56 20736 419592 20744 23212 31219 4 87 0 7 2 3 23 713696 4384 152 38172 796 0 481420 96 16498 27177 8 87 0 4 1 3 27 713360 4116 152 38372 1844 0 1308552 296 25074 33901 5 85 0 9 1 3 29 714628 4416 180 41992 256 2556 501832 2704 56498 76293 3 91 0 5 1 3 29 714572 3860 172 41076 156 0 920736 152 12131 17339 5 94 0 0 0 4 28 714396 5108 152 40124 212 10924 567648 11148 41901 56712 4 90 0 4 2 3 30 725216 4060 136 40604 124 0 286384 156 21992 35505 5 91 0 2 3 8 12 148836 230388 320 70888 5356 0 24304 52 9977 15084 17 75 0 5 3 0 0 146692 271900 416 76680 2200 0 6592 0 1561 3198 10 10 78 2 1 0 0 146584 271900 416 76892 152 0 184 0 75 139 0 0 100 0 1 0 0 146488 271396 552 76980 128 0 264 36 124 230 0 1 98 1 0 0 0 146372 271076 680 77196 124 0 252 8 79 167 0 0 100 0 0 0 0 146312 270948 688 77332 64 0 64 80 61 102 0 0 97 3 1

What's wrong here?
Kernel Bug?

Ciao,
Gerhard

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>

typedef unsigned int BOOL;
typedef char* PCHAR;
typedef unsigned int DWORD;

#define FALSE 0
#define TRUE 1

BOOL getlong(PCHAR s,DWORD* retvalue)
{
  char *eptr;
  long value;

  value=strtol(s,&eptr,0);
  if ((eptr==s)||(*eptr!='\0')) return FALSE;
  if (value<0) return FALSE;
  *retvalue=value;
  return TRUE;
}

int main(int argc,char* argv[])
{
  unsigned int* p;
  unsigned int size=16*1024*1024;
  unsigned int size_of=sizeof(unsigned int);
  int i;

  if (argc>1)
  {
    if (!getlong(argv[1],&size))
    {
      printf("Wrong memsize!\n");
      exit(1);
    }
  }

  p=malloc(size);

  for(i=0;i<(size/size_of);i++) p[i]=0;

  sleep(3600);

  free(p);

  return 0;
}





reply via email to

[Prev in Thread] Current Thread [Next in Thread]