[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [RFC PATCH RDMA support v4: 03/10] more verbose documen
From: |
Michael R. Hines |
Subject: |
Re: [Qemu-devel] [RFC PATCH RDMA support v4: 03/10] more verbose documentation of the RDMA transport |
Date: |
Fri, 05 Apr 2013 16:45:34 -0400 |
User-agent: |
Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130106 Thunderbird/17.0.2 |
On 03/21/2013 02:11 AM, Michael S. Tsirkin wrote:
On Tue, Mar 19, 2013 at 01:49:34PM -0400, Michael R. Hines wrote:
I also did a test using RDMA + cgroup, and the kernel killed my QEMU :)
So, infiniband is not smart enough to know how to avoid pinning a
zero page, I guess.
- Michael
On 03/19/2013 01:14 PM, Paolo Bonzini wrote:
Il 19/03/2013 18:09, Michael R. Hines ha scritto:
Allowing QEMU to swap due to a cgroup limit during migration is a viable
overcommit option?
I'm trying to keep an open mind, but that would kill the migration
time.....
Would it swap? Doesn't the kernel back all zero pages with a single
copy-on-write page? If that still accounts towards cgroup limits, it
would be a bug.
Old kernels do not have a shared zero hugepage, and that includes some
distro kernels. Perhaps that's the problem.
Paolo
I really shouldn't break COW if you don't request LOCAL_WRITE.
I think it's a kernel bug, and apparently has been there in the code since the
first version: get_user_pages parameters swapped.
I'll send a patch. If it's applied, you should also
change your code from
+ IBV_ACCESS_LOCAL_WRITE |
+ IBV_ACCESS_REMOTE_WRITE |
+ IBV_ACCESS_REMOTE_READ);
to
+ IBV_ACCESS_REMOTE_READ);
on send side.
Then, each time we detect a page has changed we must make sure to
unregister and re-register it. Or if you want to be very
smart, check that the PFN didn't change and reregister
if it did.
This will make overcommit work.
Unfortunately RDMA + cgroups still kills QEMU:
I removed the *_WRITE flags and did a test like this:
1. Start QEMU with 2GB ram configured
$ cd /sys/fs/cgroup/memory/libvirt/qemu
$ echo "-1" > memory.memsw.limit_in_bytes
$ echo "-1" > memory.limit_in_bytes
$ echo $(pidof qemu-system-x86_64) > tasks
$ echo 512M > memory.limit_in_bytes # maximum RSS
$ echo 3G > memory.memsw.limit_in_bytes # maximum RSS + swap, extra
1G to be safe
2. Start RDMA migration
3. RSS of 512M is reached
4. swap starts filling up
5. the kernel kills QEMU
6. dmesg:
[ 2981.657135] Task in /libvirt/qemu killed as a result of limit of
/libvirt/qemu
[ 2981.657140] memory: usage 524288kB, limit 524288kB, failcnt 18031
[ 2981.657143] memory+swap: usage 525460kB, limit 3145728kB, failcnt 0
[ 2981.657146] Mem-Info:
[ 2981.657148] Node 0 DMA per-cpu:
[ 2981.657152] CPU 0: hi: 0, btch: 1 usd: 0
[ 2981.657155] CPU 1: hi: 0, btch: 1 usd: 0
[ 2981.657157] CPU 2: hi: 0, btch: 1 usd: 0
[ 2981.657160] CPU 3: hi: 0, btch: 1 usd: 0
[ 2981.657163] CPU 4: hi: 0, btch: 1 usd: 0
[ 2981.657165] CPU 5: hi: 0, btch: 1 usd: 0
[ 2981.657167] CPU 6: hi: 0, btch: 1 usd: 0
[ 2981.657170] CPU 7: hi: 0, btch: 1 usd: 0
[ 2981.657172] Node 0 DMA32 per-cpu:
[ 2981.657176] CPU 0: hi: 186, btch: 31 usd: 160
[ 2981.657178] CPU 1: hi: 186, btch: 31 usd: 22
[ 2981.657181] CPU 2: hi: 186, btch: 31 usd: 179
[ 2981.657184] CPU 3: hi: 186, btch: 31 usd: 6
[ 2981.657186] CPU 4: hi: 186, btch: 31 usd: 21
[ 2981.657189] CPU 5: hi: 186, btch: 31 usd: 15
[ 2981.657191] CPU 6: hi: 186, btch: 31 usd: 19
[ 2981.657194] CPU 7: hi: 186, btch: 31 usd: 22
[ 2981.657196] Node 0 Normal per-cpu:
[ 2981.657200] CPU 0: hi: 186, btch: 31 usd: 44
[ 2981.657202] CPU 1: hi: 186, btch: 31 usd: 58
[ 2981.657205] CPU 2: hi: 186, btch: 31 usd: 156
[ 2981.657207] CPU 3: hi: 186, btch: 31 usd: 107
[ 2981.657210] CPU 4: hi: 186, btch: 31 usd: 44
[ 2981.657213] CPU 5: hi: 186, btch: 31 usd: 70
[ 2981.657215] CPU 6: hi: 186, btch: 31 usd: 76
[ 2981.657218] CPU 7: hi: 186, btch: 31 usd: 173
[ 2981.657223] active_anon:181703 inactive_anon:68856 isolated_anon:0
[ 2981.657224] active_file:66881 inactive_file:141056 isolated_file:0
[ 2981.657225] unevictable:2174 dirty:6 writeback:0 unstable:0
[ 2981.657226] free:4058168 slab_reclaimable:5152 slab_unreclaimable:10785
[ 2981.657227] mapped:7709 shmem:192 pagetables:1913 bounce:0
[ 2981.657230] Node 0 DMA free:15896kB min:56kB low:68kB high:84kB
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15672kB
mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB
pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
all_unreclaimable? no
[ 2981.657242] lowmem_reserve[]: 0 1966 18126 18126
[ 2981.657249] Node 0 DMA32 free:1990652kB min:7324kB low:9152kB
high:10984kB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
present:2013280kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB
shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB
pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
all_unreclaimable? no
[ 2981.657260] lowmem_reserve[]: 0 0 16160 16160
[ 2981.657268] Node 0 Normal free:14226124kB min:60200kB low:75248kB
high:90300kB active_anon:726812kB inactive_anon:275424kB
active_file:267524kB inactive_file:564224kB unevictable:8696kB
isolated(anon):0kB isolated(file):0kB present:16547840kB mlocked:6652kB
dirty:24kB writeback:0kB mapped:30832kB shmem:768kB
slab_reclaimable:20608kB slab_unreclaimable:43140kB kernel_stack:1784kB
pagetables:7652kB unstable:0kB bounce:0kB writeback_tmp:0kB
pages_scanned:0 all_unreclaimable? no
[ 2981.657281] lowmem_reserve[]: 0 0 0 0
[ 2981.657289] Node 0 DMA: 0*4kB 1*8kB 1*16kB 0*32kB 2*64kB 1*128kB
1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15896kB
[ 2981.657307] Node 0 DMA32: 17*4kB 9*8kB 7*16kB 4*32kB 8*64kB 5*128kB
6*256kB 4*512kB 3*1024kB 6*2048kB 481*4096kB = 1990652kB
[ 2981.657325] Node 0 Normal: 2*4kB 1*8kB 991*16kB 893*32kB 271*64kB
50*128kB 50*256kB 12*512kB 5*1024kB 1*2048kB 3450*4096kB = 14225504kB
[ 2981.657343] 277718 total pagecache pages
[ 2981.657345] 68816 pages in swap cache
[ 2981.657348] Swap cache stats: add 656848, delete 588032, find 19850/22338
[ 2981.657350] Free swap = 15288376kB
[ 2981.657353] Total swap = 15564796kB
[ 2981.706982] 4718576 pages RAM
- Re: [Qemu-devel] [RFC PATCH RDMA support v4: 03/10] more verbose documentation of the RDMA transport,
Michael R. Hines <=