qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) V


From: zhanghailiang
Subject: Re: [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service
Date: Tue, 28 Apr 2015 18:51:28 +0800
User-agent: Mozilla/5.0 (Windows NT 6.1; rv:31.0) Gecko/20100101 Thunderbird/31.1.1

On 2015/4/24 16:35, Dr. David Alan Gilbert wrote:
* Wen Congyang (address@hidden) wrote:
On 04/22/2015 07:18 PM, Dr. David Alan Gilbert wrote:
* zhanghailiang (address@hidden) wrote:
Hi,

ping ...

I will get to look at this again; but not until after next week.

The main blocked bugs for COLO have been solved,

I've got the v3 set running, but the biggest problem I hit are problems
with the packet comparison module; I've seen a panic which I think is
in colo_send_checkpoint_req that I think is due to the use of
GFP_KERNEL to allocate the netlink message and I think it can schedule
there.  I tried making that a GFP_ATOMIC  but I'm hitting other
problems with :

Thanks for your test.
I guest the backtrace should like:
1. colo_send_checkpoint_req()
2. colo_setup_checkpoint_by_id()

Because we hold rcu read lock, so we cannot use GFP_KERNEL to malloc memory.

See the backtrace below.

kcolo_thread, no conn, schedule out

Hmm, how to reproduce it? In my test, I only focus on block replication, and
I don't use the network.


that I've not had time to look into yet.

So I only get about a 50% success rate of starting COLO.
I see there are stuff in the TODO of the colo-proxy that
seem to say the netlink stuff should change, maybe you're already fixing
that?

Do you mean you get about a 50% success rate if you use the network?

I always run with the network configured; but the 'kcolo_thread, no conn' bug
will hit very early; so I don't see any output on the primary or secondary
after the migrate -d is issued on the primary.  On the primary in the dmesg
I see:
[  736.607043] ip_tables: (C) 2000-2006 Netfilter Core Team
[  736.615268] kcolo_thread, no conn, schedule out, chk 0
[  736.619442] ip6_tables: (C) 2000-2006 Netfilter Core Team
[  736.718273] arp_tables: (C) 2002 David S. Miller

I've not had a chance to look further at that yet.

Here is the backtrace from the 1st bug.

Dave (I'm on holiday next week; I probably won't respond to many mails)

[ 9087.833228] BUG: scheduling while atomic: swapper/1/0/0x10000100
[ 9087.833271] Modules linked in: ip6table_mangle ip6_tables xt_physdev 
iptable_mangle xt_PMYCOLO(OF) nf_conntrack_i
pv4 nf_defrag_ipv4 xt_mark nf_conntrack_colo(OF) nf_conntrack_ipv6 
nf_defrag_ipv6 nf_conntrack iptable_filter ip_tab
les arptable_filter arp_tables act_mirred cls_u32 sch_prio tun bridge stp llc 
sg kvm_intel kvm snd_hda_codec_generic
  cirrus snd_hda_intel crct10dif_pclmul snd_hda_codec crct10dif_common 
snd_hwdep syscopyarea snd_seq crc32_pclmul crc
32c_intel sysfillrect ghash_clmulni_intel snd_seq_device aesni_intel lrw 
sysimgblt gf128mul ttm drm_kms_helper snd_p
cm snd_page_alloc snd_timer snd soundcore glue_helper i2c_piix4 ablk_helper drm 
cryptd virtio_console i2c_core virti
o_balloon serio_raw mperf pcspkr nfsd auth_rpcgss nfs_acl lockd uinput sunrpc 
xfs libcrc32c sr_mod cdrom ata_generic
[ 9087.833572]  pata_acpi virtio_net virtio_blk ata_piix e1000 virtio_pci 
libata virtio_ring floppy virtio dm_mirror
  dm_region_hash dm_log dm_mod [last unloaded: ip_tables]
[ 9087.833616] CPU: 1 PID: 0 Comm: swapper/1 Tainted: GF          
O--------------   3.10.0-123.20.1.el7.dgilbertcolo
.x86_64 #1
[ 9087.833623] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 9087.833630]  ffff880813de8000 7b4d45d276068aee ffff88083fc23980 
ffffffff815e2b0c
[ 9087.833640]  ffff88083fc23990 ffffffff815dca9f ffff88083fc239f0 
ffffffff815e827b
[ 9087.833648]  ffff880813de9fd8 00000000000135c0 ffff880813de9fd8 
00000000000135c0
[ 9087.833657] Call Trace:
[ 9087.833664]  <IRQ>  [<ffffffff815e2b0c>] dump_stack+0x19/0x1b
[ 9087.833680]  [<ffffffff815dca9f>] __schedule_bug+0x4d/0x5b
[ 9087.833688]  [<ffffffff815e827b>] __schedule+0x78b/0x790
[ 9087.833699]  [<ffffffff81094fb6>] __cond_resched+0x26/0x30
[ 9087.833707]  [<ffffffff815e86aa>] _cond_resched+0x3a/0x50
[ 9087.833716]  [<ffffffff81193908>] kmem_cache_alloc_node+0x38/0x200
[ 9087.833752]  [<ffffffffa046b770>] ? nf_conntrack_find_get+0x30/0x40 
[nf_conntrack]
[ 9087.833761]  [<ffffffff814c115d>] ? __alloc_skb+0x5d/0x2d0
[ 9087.833768]  [<ffffffff814c115d>] __alloc_skb+0x5d/0x2d0
[ 9087.833777]  [<ffffffff814fb972>] ? netlink_lookup+0x32/0xf0
[ 9087.833786]  [<ffffffff8153b7d0>] ? arp_req_set+0x270/0x270
[ 9087.833794]  [<ffffffff814fbc3b>] netlink_alloc_skb+0x6b/0x1e0
[ 9087.833801]  [<ffffffff8153b7d0>] ? arp_req_set+0x270/0x270
[ 9087.833816]  [<ffffffffa04a462b>] colo_send_checkpoint_req+0x2b/0x80 
[xt_PMYCOLO]
[ 9087.833823]  [<ffffffff8153b7d0>] ? arp_req_set+0x270/0x270
[ 9087.833832]  [<ffffffffa04a4dd9>] colo_slaver_arp_hook+0x79/0xa0 [xt_PMYCOLO]
[ 9087.833850]  [<ffffffffa05fc02f>] ? arptable_filter_hook+0x2f/0x40 
[arptable_filter]
[ 9087.833858]  [<ffffffff81500c5a>] nf_iterate+0xaa/0xc0
[ 9087.833866]  [<ffffffff8153b7d0>] ? arp_req_set+0x270/0x270
[ 9087.833874]  [<ffffffff81500cf4>] nf_hook_slow+0x84/0x140
[ 9087.833882]  [<ffffffff8153b7d0>] ? arp_req_set+0x270/0x270
[ 9087.833890]  [<ffffffff8153bf60>] arp_rcv+0x120/0x160
[ 9087.833906]  [<ffffffff814d0596>] __netif_receive_skb_core+0x676/0x870
[ 9087.833914]  [<ffffffff814d07a8>] __netif_receive_skb+0x18/0x60
[ 9087.833922]  [<ffffffff814d0830>] netif_receive_skb+0x40/0xd0
[ 9087.833930]  [<ffffffff814d1290>] napi_gro_receive+0x80/0xb0
[ 9087.833959]  [<ffffffffa00e34a0>] e1000_clean_rx_irq+0x2b0/0x580 [e1000]
[ 9087.833970]  [<ffffffffa00e5985>] e1000_clean+0x265/0x8e0 [e1000]
[ 9087.833979]  [<ffffffff8109506d>] ? ttwu_do_activate.constprop.85+0x5d/0x70
[ 9087.833988]  [<ffffffff814d0bfa>] net_rx_action+0x15a/0x250
[ 9087.833997]  [<ffffffff81067047>] __do_softirq+0xf7/0x290
[ 9087.834006]  [<ffffffff815f4b5c>] call_softirq+0x1c/0x30
[ 9087.834011]  [<ffffffff81014cf5>] do_softirq+0x55/0x90
[ 9087.834011]  [<ffffffff810673e5>] irq_exit+0x115/0x120
[ 9087.834011]  [<ffffffff815f5458>] do_IRQ+0x58/0xf0
[ 9087.834011]  [<ffffffff815ea5ad>] common_interrupt+0x6d/0x6d
[ 9087.834011]  <EOI>  [<ffffffff81046346>] ? native_safe_halt+0x6/0x10
[ 9087.834011]  [<ffffffff8101b39f>] default_idle+0x1f/0xc0
[ 9087.834011]  [<ffffffff8101bc96>] arch_cpu_idle+0x26/0x30
[ 9087.834011]  [<ffffffff810b47e5>] cpu_startup_entry+0xf5/0x290
[ 9087.834011]  [<ffffffff815d0a6e>] start_secondary+0x1c4/0x1da
[ 9087.837189] ------------[ cut here ]------------
[ 9087.837189] kernel BUG at net/core/dev.c:4130!


Hi Dave,

This seems to be a deadlock bug. We have called some functions that could lead 
to schedule
between rcu read lock and unlock. There are two places, One is 
netlink_alloc_skb() with GFP_KERNEL flag,
and the other one is netlink_unicast() (It can also lead to schedule in some 
special cases).

Please test with the follow modification. ;)

diff --git a/xt_PMYCOLO.c b/xt_PMYCOLO.c
index a8cf1a1..d8a6eab 100644
--- a/xt_PMYCOLO.c
+++ b/xt_PMYCOLO.c
@@ -1360,6 +1360,7 @@ static void colo_setup_checkpoint_by_id(u32 id) {
        if (node) {
                pr_dbg("mark %d, find colo_primary %p, setup checkpoint\n",
                        id, node);
+               rcu_read_unlock();
                colo_send_checkpoint_req(&node->u.p);
        }
        rcu_read_unlock();


Thanks,
zhanghailiang



Thanks
Wen Congyang


we also have finished some new features and optimization on COLO. (If you are 
interested in this,
we can send them to you in private ;))

For easy of review, it is better to keep it simple now, so we will not add too 
much new codes into this frame
patch set before it been totally reviewed.

I'd like to see those; but I don't want to take code privately.
It's OK to post extra stuff as a separate set.

COLO is a totally new feature which is still in early stage, we hope to speed 
up the development,
so your comments and feedback are warmly welcomed. :)

Yes, it's getting there though; I don't think anyone else has
got this close to getting a full FT set working with disk and networking.

Dave



--
Dr. David Alan Gilbert / address@hidden / Manchester, UK

.






reply via email to

[Prev in Thread] Current Thread [Next in Thread]