qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH RDMA support v4: 03/10] more verbose documen


From: Michael R. Hines
Subject: Re: [Qemu-devel] [RFC PATCH RDMA support v4: 03/10] more verbose documentation of the RDMA transport
Date: Fri, 05 Apr 2013 16:45:34 -0400
User-agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130106 Thunderbird/17.0.2

On 03/21/2013 02:11 AM, Michael S. Tsirkin wrote:
On Tue, Mar 19, 2013 at 01:49:34PM -0400, Michael R. Hines wrote:
I also did a test using RDMA + cgroup, and the kernel killed my QEMU :)

So, infiniband is not smart enough to know how to avoid pinning a
zero page, I guess.

- Michael

On 03/19/2013 01:14 PM, Paolo Bonzini wrote:
Il 19/03/2013 18:09, Michael R. Hines ha scritto:
Allowing QEMU to swap due to a cgroup limit during migration is a viable
overcommit option?

I'm trying to keep an open mind, but that would kill the migration
time.....
Would it swap?  Doesn't the kernel back all zero pages with a single
copy-on-write page?  If that still accounts towards cgroup limits, it
would be a bug.

Old kernels do not have a shared zero hugepage, and that includes some
distro kernels.  Perhaps that's the problem.

Paolo

I really shouldn't break COW if you don't request LOCAL_WRITE.
I think it's a kernel bug, and apparently has been there in the code since the
first version: get_user_pages parameters swapped.

I'll send a patch. If it's applied, you should also
change your code from

+                                IBV_ACCESS_LOCAL_WRITE |
+                                IBV_ACCESS_REMOTE_WRITE |
+                                IBV_ACCESS_REMOTE_READ);

to

+                                IBV_ACCESS_REMOTE_READ);

on send side.
Then, each time we detect a page has changed we must make sure to
unregister and re-register it. Or if you want to be very
smart, check that the PFN didn't change and reregister
if it did.

This will make overcommit work.

Unfortunately RDMA + cgroups still kills QEMU:

I removed the *_WRITE flags and did a test like this:

1. Start QEMU with 2GB ram configured

$ cd /sys/fs/cgroup/memory/libvirt/qemu
$ echo "-1" > memory.memsw.limit_in_bytes
$ echo "-1" > memory.limit_in_bytes
$ echo $(pidof qemu-system-x86_64) > tasks
$ echo 512M > memory.limit_in_bytes              # maximum RSS
$ echo 3G > memory.memsw.limit_in_bytes # maximum RSS + swap, extra 1G to be safe

2. Start RDMA migration

3. RSS of 512M is reached
4. swap starts filling up
5. the kernel kills QEMU
6. dmesg:

[ 2981.657135] Task in /libvirt/qemu killed as a result of limit of /libvirt/qemu
[ 2981.657140] memory: usage 524288kB, limit 524288kB, failcnt 18031
[ 2981.657143] memory+swap: usage 525460kB, limit 3145728kB, failcnt 0
[ 2981.657146] Mem-Info:
[ 2981.657148] Node 0 DMA per-cpu:
[ 2981.657152] CPU    0: hi:    0, btch:   1 usd:   0
[ 2981.657155] CPU    1: hi:    0, btch:   1 usd:   0
[ 2981.657157] CPU    2: hi:    0, btch:   1 usd:   0
[ 2981.657160] CPU    3: hi:    0, btch:   1 usd:   0
[ 2981.657163] CPU    4: hi:    0, btch:   1 usd:   0
[ 2981.657165] CPU    5: hi:    0, btch:   1 usd:   0
[ 2981.657167] CPU    6: hi:    0, btch:   1 usd:   0
[ 2981.657170] CPU    7: hi:    0, btch:   1 usd:   0
[ 2981.657172] Node 0 DMA32 per-cpu:
[ 2981.657176] CPU    0: hi:  186, btch:  31 usd: 160
[ 2981.657178] CPU    1: hi:  186, btch:  31 usd:  22
[ 2981.657181] CPU    2: hi:  186, btch:  31 usd: 179
[ 2981.657184] CPU    3: hi:  186, btch:  31 usd:   6
[ 2981.657186] CPU    4: hi:  186, btch:  31 usd:  21
[ 2981.657189] CPU    5: hi:  186, btch:  31 usd:  15
[ 2981.657191] CPU    6: hi:  186, btch:  31 usd:  19
[ 2981.657194] CPU    7: hi:  186, btch:  31 usd:  22
[ 2981.657196] Node 0 Normal per-cpu:
[ 2981.657200] CPU    0: hi:  186, btch:  31 usd:  44
[ 2981.657202] CPU    1: hi:  186, btch:  31 usd:  58
[ 2981.657205] CPU    2: hi:  186, btch:  31 usd: 156
[ 2981.657207] CPU    3: hi:  186, btch:  31 usd: 107
[ 2981.657210] CPU    4: hi:  186, btch:  31 usd:  44
[ 2981.657213] CPU    5: hi:  186, btch:  31 usd:  70
[ 2981.657215] CPU    6: hi:  186, btch:  31 usd:  76
[ 2981.657218] CPU    7: hi:  186, btch:  31 usd: 173
[ 2981.657223] active_anon:181703 inactive_anon:68856 isolated_anon:0
[ 2981.657224]  active_file:66881 inactive_file:141056 isolated_file:0
[ 2981.657225]  unevictable:2174 dirty:6 writeback:0 unstable:0
[ 2981.657226]  free:4058168 slab_reclaimable:5152 slab_unreclaimable:10785
[ 2981.657227]  mapped:7709 shmem:192 pagetables:1913 bounce:0
[ 2981.657230] Node 0 DMA free:15896kB min:56kB low:68kB high:84kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15672kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[ 2981.657242] lowmem_reserve[]: 0 1966 18126 18126
[ 2981.657249] Node 0 DMA32 free:1990652kB min:7324kB low:9152kB high:10984kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2013280kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[ 2981.657260] lowmem_reserve[]: 0 0 16160 16160
[ 2981.657268] Node 0 Normal free:14226124kB min:60200kB low:75248kB high:90300kB active_anon:726812kB inactive_anon:275424kB active_file:267524kB inactive_file:564224kB unevictable:8696kB isolated(anon):0kB isolated(file):0kB present:16547840kB mlocked:6652kB dirty:24kB writeback:0kB mapped:30832kB shmem:768kB slab_reclaimable:20608kB slab_unreclaimable:43140kB kernel_stack:1784kB pagetables:7652kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[ 2981.657281] lowmem_reserve[]: 0 0 0 0
[ 2981.657289] Node 0 DMA: 0*4kB 1*8kB 1*16kB 0*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15896kB [ 2981.657307] Node 0 DMA32: 17*4kB 9*8kB 7*16kB 4*32kB 8*64kB 5*128kB 6*256kB 4*512kB 3*1024kB 6*2048kB 481*4096kB = 1990652kB [ 2981.657325] Node 0 Normal: 2*4kB 1*8kB 991*16kB 893*32kB 271*64kB 50*128kB 50*256kB 12*512kB 5*1024kB 1*2048kB 3450*4096kB = 14225504kB
[ 2981.657343] 277718 total pagecache pages
[ 2981.657345] 68816 pages in swap cache
[ 2981.657348] Swap cache stats: add 656848, delete 588032, find 19850/22338
[ 2981.657350] Free swap  = 15288376kB
[ 2981.657353] Total swap = 15564796kB
[ 2981.706982] 4718576 pages RAM







reply via email to

[Prev in Thread] Current Thread [Next in Thread]