qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] Re: [STAGING]: Block migration segfaults


From: Luiz Capitulino
Subject: [Qemu-devel] Re: [STAGING]: Block migration segfaults
Date: Thu, 3 Dec 2009 16:59:07 -0200

On Thu, 03 Dec 2009 19:36:15 +0100
Jan Kiszka <address@hidden> wrote:

> Luiz Capitulino wrote:
> >  Hi there,
> > 
> >  Got this while testing block migration in staging:
> > 
> > """
> > Program terminated with signal 11, Segmentation fault.
> > #0  0x0000000000410cf9 in monitor_vprintf (mon=0x0, fmt=0x5ae5e7 "Start 
> > full migration for %s\n",
> > ap=0x7fff1f830a40) at /home/lcapitulino/src/aliguori-queue/monitor.c:192
> > 192         if (mon->mc && !mon->mc->print_enabled) {
> > """
> > 
> >  The problem here is that init_blk_migration() calls monitor_printf() with
> > a NULL 'mon' and the backtrace shows that this is true for the entire call
> > chain.
> 
> What is the backtrace? And how did you start the migration?

 Started the source VM with:

# qemu -hda disks/fedora-11-kratos-i386.img -enable-kvm -snapshot \
       -balloon virtio -m 1G -S

 and the destination one with:

# qemu -hda disks/fedora-11-kratos-i386.img -enable-kvm -snapshot \
       -balloon virtio -m 1G -S -incoming tcp:0:4444

 Migration command issued:

(QEMU) migrate -d -b tcp:0:4444


 I have no idea if this correct and wondered if specifying the same image
for the destination VM would be catastrophic. :)

 The backtrace follows and it's the source VM which segfaults:

"""
#0  0x0000000000410c11 in monitor_vprintf (mon=0x0, fmt=0x5ada87 "Start full 
migration for %s\n", 
#1  0x0000000000410d59 in monitor_printf (mon=0x0, fmt=0x5ada87 "Start full 
migration for %s\n")
#2  0x00000000004e584b in init_blk_migration (mon=0x0, f=0x2864130) at 
block-migration.c:254
#3  0x00000000004e5d8a in block_save_live (mon=0x0, f=0x2864130, stage=1, 
opaque=0xbea960)
#4  0x00000000004db61a in qemu_savevm_state_begin (mon=0x0, f=0x2864130, 
blk_enable=1, shared=0)
#5  0x00000000004d2448 in migrate_fd_connect (s=0x20bf470) at migration.c:279
#6  0x00000000004d2849 in tcp_wait_for_connect (opaque=0x20bf470) at 
migration-tcp.c:72
#7  0x000000000040c3f8 in main_loop_wait (timeout=5000) at 
/home/lcapitulino/src/aliguori-queue/vl.c:3875
#8  0x000000000040ca17 in main_loop () at 
/home/lcapitulino/src/aliguori-queue/vl.c:4095
#9  0x00000000004104ce in main (argc=10, argv=0x7fff6f838bf8, 
envp=0x7fff6f838c50)
"""

[...]

> >  A patch from me (4a29a in staging) changes a higher level monitor
> > function to touch 'mon' before passing it down and here's the segfault.
> > 
> >  Now, the point is: I could give the old behavior back but I think we're
> > hiding a bug there. Why would you call monitor_printf() with a NULL 'mon'?
> 
> If there is no monitor associated with the current context, it can be
> NULL. This is mostly the case during early startup.

 Why would one call monitor_printf() before monitor initialization?

> One may also use
> this to suppress output (though I don't recall any real case ATM).

 I would prefer functions like monitor_disable()/monitor_enable()
for this...

> >  Anyways, the following patch adds the old behavior back just in case
> > you want to see it working...
> 
> Yes, better restore the check. Still, your call stack would be
> interesting. Maybe there is actual another bug behind it.

 Ok, I'll send the fix because it's not that important right now,
but I'm not convinced this is the right thing to do.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]