Re: [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) V

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) V

From:	Dr. David Alan Gilbert
Subject:	Re: [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service
Date:	Wed, 6 May 2015 18:11:53 +0100
User-agent:	Mutt/1.5.23 (2014-03-12)

* zhanghailiang (address@hidden) wrote:
> On 2015/4/24 16:35, Dr. David Alan Gilbert wrote:
> >* Wen Congyang (address@hidden) wrote:
> >>On 04/22/2015 07:18 PM, Dr. David Alan Gilbert wrote:
> >>>* zhanghailiang (address@hidden) wrote:
> >>>>Hi,
> >>>>
> >>>>ping ...
> >>>
> >>>I will get to look at this again; but not until after next week.
> >>>
> >>>>The main blocked bugs for COLO have been solved,
> >>>
> >>>I've got the v3 set running, but the biggest problem I hit are problems
> >>>with the packet comparison module; I've seen a panic which I think is
> >>>in colo_send_checkpoint_req that I think is due to the use of
> >>>GFP_KERNEL to allocate the netlink message and I think it can schedule
> >>>there.  I tried making that a GFP_ATOMIC  but I'm hitting other
> >>>problems with :
> >>
> >>Thanks for your test.
> >>I guest the backtrace should like:
> >>1. colo_send_checkpoint_req()
> >>2. colo_setup_checkpoint_by_id()
> >>
> >>Because we hold rcu read lock, so we cannot use GFP_KERNEL to malloc memory.
> >
> >See the backtrace below.
> >
> >>>kcolo_thread, no conn, schedule out
> >>
> >>Hmm, how to reproduce it? In my test, I only focus on block replication, and
> >>I don't use the network.
> >>
> >>>
> >>>that I've not had time to look into yet.
> >>>
> >>>So I only get about a 50% success rate of starting COLO.
> >>>I see there are stuff in the TODO of the colo-proxy that
> >>>seem to say the netlink stuff should change, maybe you're already fixing
> >>>that?
> >>
> >>Do you mean you get about a 50% success rate if you use the network?
> >
> >I always run with the network configured; but the 'kcolo_thread, no conn' bug
> >will hit very early; so I don't see any output on the primary or secondary
> >after the migrate -d is issued on the primary.  On the primary in the dmesg
> >I see:
> >[  736.607043] ip_tables: (C) 2000-2006 Netfilter Core Team
> >[  736.615268] kcolo_thread, no conn, schedule out, chk 0
> >[  736.619442] ip6_tables: (C) 2000-2006 Netfilter Core Team
> >[  736.718273] arp_tables: (C) 2002 David S. Miller
> >
> >I've not had a chance to look further at that yet.
> >
> >Here is the backtrace from the 1st bug.
> >
> >Dave (I'm on holiday next week; I probably won't respond to many mails)
> >
> >[ 9087.833228] BUG: scheduling while atomic: swapper/1/0/0x10000100
> >[ 9087.833271] Modules linked in: ip6table_mangle ip6_tables xt_physdev 
> >iptable_mangle xt_PMYCOLO(OF) nf_conntrack_i
> >pv4 nf_defrag_ipv4 xt_mark nf_conntrack_colo(OF) nf_conntrack_ipv6 
> >nf_defrag_ipv6 nf_conntrack iptable_filter ip_tab
> >les arptable_filter arp_tables act_mirred cls_u32 sch_prio tun bridge stp 
> >llc sg kvm_intel kvm snd_hda_codec_generic
> >  cirrus snd_hda_intel crct10dif_pclmul snd_hda_codec crct10dif_common 
> > snd_hwdep syscopyarea snd_seq crc32_pclmul crc
> >32c_intel sysfillrect ghash_clmulni_intel snd_seq_device aesni_intel lrw 
> >sysimgblt gf128mul ttm drm_kms_helper snd_p
> >cm snd_page_alloc snd_timer snd soundcore glue_helper i2c_piix4 ablk_helper 
> >drm cryptd virtio_console i2c_core virti
> >o_balloon serio_raw mperf pcspkr nfsd auth_rpcgss nfs_acl lockd uinput 
> >sunrpc xfs libcrc32c sr_mod cdrom ata_generic
> >[ 9087.833572]  pata_acpi virtio_net virtio_blk ata_piix e1000 virtio_pci 
> >libata virtio_ring floppy virtio dm_mirror
> >  dm_region_hash dm_log dm_mod [last unloaded: ip_tables]
> >[ 9087.833616] CPU: 1 PID: 0 Comm: swapper/1 Tainted: GF          
> >O--------------   3.10.0-123.20.1.el7.dgilbertcolo
> >.x86_64 #1
> >[ 9087.833623] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> >[ 9087.833630]  ffff880813de8000 7b4d45d276068aee ffff88083fc23980 
> >ffffffff815e2b0c
> >[ 9087.833640]  ffff88083fc23990 ffffffff815dca9f ffff88083fc239f0 
> >ffffffff815e827b
> >[ 9087.833648]  ffff880813de9fd8 00000000000135c0 ffff880813de9fd8 
> >00000000000135c0
> >[ 9087.833657] Call Trace:
> >[ 9087.833664]  <IRQ>  [<ffffffff815e2b0c>] dump_stack+0x19/0x1b
> >[ 9087.833680]  [<ffffffff815dca9f>] __schedule_bug+0x4d/0x5b
> >[ 9087.833688]  [<ffffffff815e827b>] __schedule+0x78b/0x790
> >[ 9087.833699]  [<ffffffff81094fb6>] __cond_resched+0x26/0x30
> >[ 9087.833707]  [<ffffffff815e86aa>] _cond_resched+0x3a/0x50
> >[ 9087.833716]  [<ffffffff81193908>] kmem_cache_alloc_node+0x38/0x200
> >[ 9087.833752]  [<ffffffffa046b770>] ? nf_conntrack_find_get+0x30/0x40 
> >[nf_conntrack]
> >[ 9087.833761]  [<ffffffff814c115d>] ? __alloc_skb+0x5d/0x2d0
> >[ 9087.833768]  [<ffffffff814c115d>] __alloc_skb+0x5d/0x2d0
> >[ 9087.833777]  [<ffffffff814fb972>] ? netlink_lookup+0x32/0xf0
> >[ 9087.833786]  [<ffffffff8153b7d0>] ? arp_req_set+0x270/0x270
> >[ 9087.833794]  [<ffffffff814fbc3b>] netlink_alloc_skb+0x6b/0x1e0
> >[ 9087.833801]  [<ffffffff8153b7d0>] ? arp_req_set+0x270/0x270
> >[ 9087.833816]  [<ffffffffa04a462b>] colo_send_checkpoint_req+0x2b/0x80 
> >[xt_PMYCOLO]
> >[ 9087.833823]  [<ffffffff8153b7d0>] ? arp_req_set+0x270/0x270
> >[ 9087.833832]  [<ffffffffa04a4dd9>] colo_slaver_arp_hook+0x79/0xa0 
> >[xt_PMYCOLO]
> >[ 9087.833850]  [<ffffffffa05fc02f>] ? arptable_filter_hook+0x2f/0x40 
> >[arptable_filter]
> >[ 9087.833858]  [<ffffffff81500c5a>] nf_iterate+0xaa/0xc0
> >[ 9087.833866]  [<ffffffff8153b7d0>] ? arp_req_set+0x270/0x270
> >[ 9087.833874]  [<ffffffff81500cf4>] nf_hook_slow+0x84/0x140
> >[ 9087.833882]  [<ffffffff8153b7d0>] ? arp_req_set+0x270/0x270
> >[ 9087.833890]  [<ffffffff8153bf60>] arp_rcv+0x120/0x160
> >[ 9087.833906]  [<ffffffff814d0596>] __netif_receive_skb_core+0x676/0x870
> >[ 9087.833914]  [<ffffffff814d07a8>] __netif_receive_skb+0x18/0x60
> >[ 9087.833922]  [<ffffffff814d0830>] netif_receive_skb+0x40/0xd0
> >[ 9087.833930]  [<ffffffff814d1290>] napi_gro_receive+0x80/0xb0
> >[ 9087.833959]  [<ffffffffa00e34a0>] e1000_clean_rx_irq+0x2b0/0x580 [e1000]
> >[ 9087.833970]  [<ffffffffa00e5985>] e1000_clean+0x265/0x8e0 [e1000]
> >[ 9087.833979]  [<ffffffff8109506d>] ? 
> >ttwu_do_activate.constprop.85+0x5d/0x70
> >[ 9087.833988]  [<ffffffff814d0bfa>] net_rx_action+0x15a/0x250
> >[ 9087.833997]  [<ffffffff81067047>] __do_softirq+0xf7/0x290
> >[ 9087.834006]  [<ffffffff815f4b5c>] call_softirq+0x1c/0x30
> >[ 9087.834011]  [<ffffffff81014cf5>] do_softirq+0x55/0x90
> >[ 9087.834011]  [<ffffffff810673e5>] irq_exit+0x115/0x120
> >[ 9087.834011]  [<ffffffff815f5458>] do_IRQ+0x58/0xf0
> >[ 9087.834011]  [<ffffffff815ea5ad>] common_interrupt+0x6d/0x6d
> >[ 9087.834011]  <EOI>  [<ffffffff81046346>] ? native_safe_halt+0x6/0x10
> >[ 9087.834011]  [<ffffffff8101b39f>] default_idle+0x1f/0xc0
> >[ 9087.834011]  [<ffffffff8101bc96>] arch_cpu_idle+0x26/0x30
> >[ 9087.834011]  [<ffffffff810b47e5>] cpu_startup_entry+0xf5/0x290
> >[ 9087.834011]  [<ffffffff815d0a6e>] start_secondary+0x1c4/0x1da
> >[ 9087.837189] ------------[ cut here ]------------
> >[ 9087.837189] kernel BUG at net/core/dev.c:4130!
> >
> 
> Hi Dave,

Hi,
  Sorry for the delayed response; I was on vacation last week.

> This seems to be a deadlock bug. We have called some functions that could 
> lead to schedule
> between rcu read lock and unlock. There are two places, One is 
> netlink_alloc_skb() with GFP_KERNEL flag,
> and the other one is netlink_unicast() (It can also lead to schedule in some 
> special cases).
> 
> Please test with the follow modification. ;)

Thanks.

> diff --git a/xt_PMYCOLO.c b/xt_PMYCOLO.c
> index a8cf1a1..d8a6eab 100644
> --- a/xt_PMYCOLO.c
> +++ b/xt_PMYCOLO.c
> @@ -1360,6 +1360,7 @@ static void colo_setup_checkpoint_by_id(u32 id) {
>         if (node) {
>                 pr_dbg("mark %d, find colo_primary %p, setup checkpoint\n",
>                         id, node);
> +               rcu_read_unlock();
>                 colo_send_checkpoint_req(&node->u.p);
>         }
>         rcu_read_unlock();

It seemed to still generate me the same backtrace;  I'm not sure that
the rcu_read_lock is the problem here, I'd assumed it was because it was
being called in the softirq stuff, but I'm fuzzy about how that's
supposed to work.

Dave

> 
> 
> Thanks,
> zhanghailiang
> 
> >>
> >>
> >>Thanks
> >>Wen Congyang
> >>
> >>>
> >>>>we also have finished some new features and optimization on COLO. (If you 
> >>>>are interested in this,
> >>>>we can send them to you in private ;))
> >>>
> >>>>For easy of review, it is better to keep it simple now, so we will not 
> >>>>add too much new codes into this frame
> >>>>patch set before it been totally reviewed.
> >>>
> >>>I'd like to see those; but I don't want to take code privately.
> >>>It's OK to post extra stuff as a separate set.
> >>>
> >>>>COLO is a totally new feature which is still in early stage, we hope to 
> >>>>speed up the development,
> >>>>so your comments and feedback are warmly welcomed. :)
> >>>
> >>>Yes, it's getting there though; I don't think anyone else has
> >>>got this close to getting a full FT set working with disk and networking.
> >>>
> >>>Dave
> >>>
> >>>>
> >>
> >--
> >Dr. David Alan Gilbert / address@hidden / Manchester, UK
> >
> >.
> >
> 
> 
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service, Dr. David Alan Gilbert <=
- Re: [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service, Dr. David Alan Gilbert, 2015/05/14
  - Re: [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service, zhanghailiang, 2015/05/14
    - Re: [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service, Dr. David Alan Gilbert, 2015/05/14

Prev by Date: [Qemu-devel] How QEMU maintains the translation from guest virtual address to guest physical address?
Next by Date: Re: [Qemu-devel] [PATCH] use bdrv_flush to provide barrier semantic in block/vdi.c for metadata updates
Previous by thread: [Qemu-devel] How QEMU maintains the translation from guest virtual address to guest physical address?
Next by thread: Re: [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service
Index(es):
- Date
- Thread