qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 1/3] colo-compare: serialize compare thread's in


From: Jason Wang
Subject: Re: [Qemu-devel] [PATCH 1/3] colo-compare: serialize compare thread's initialization with main thread
Date: Fri, 5 May 2017 11:03:23 +0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0



On 2017年05月04日 10:51, Hailiang Zhang wrote:
Hi Jason,

On 2017/4/25 19:33, Jason Wang wrote:

On 2017年04月25日 17:59, Hailiang Zhang wrote:
On 2017/4/25 16:41, Jason Wang wrote:
On 2017年04月24日 14:03, Hailiang Zhang wrote:
On 2017/4/24 12:10, Jason Wang wrote:
On 2017年04月20日 15:46, zhanghailiang wrote:
We call qemu_chr_fe_set_handlers() in colo-compare thread, it is used
to detach watched fd from default main context, so it has chance to
handle the same watched fd with main thread concurrently, which will
trigger an error report:
"qemu-char.c:918: io_watch_poll_finalize: Assertion `iwp->src ==
((void *)0)' failed."
Anyway to prevent fd from being handled by main thread before creating
colo thread? Using semaphore seems not elegant.
So how about calling qemu_mutex_lock_iothread() before
qemu_chr_fe_set_handlers() ?
Looks better, but I needs more information e.g how main thread can
touch it?
Hmm, this happened quite occasionally, and we didn't catch the first
place (backtrace)
of removing fd from been watched, but  from the codes logic, we found
there should
be such possible cases:
tcp_chr_write (Or tcp_chr_read/tcp_chr_sync_read/chr_disconnect)
  ->tcp_chr_disconnect (Or char_socket_finalize)
     ->tcp_chr_free_connection
       -> remove_fd_in_watch(chr);

Anyway, it needs the protection from been freed twice.

Thanks,
Hailiang
Still a little bit confused. The question is how could main thread still
call tcp_chr_write or other in the above case?

Finally, we reproduced this bug (We use qemu 2.6), and got the follow backtrace of this problem:

(gdb) thread apply all bt

Thread 7 (Thread 0x7f407a1ff700 (LWP 23144)):
#0  0x00007f41037e0db5 in _int_malloc () from /usr/lib64/libc.so.6
#1  0x00007f41037e3b96 in calloc () from /usr/lib64/libc.so.6
#2  0x00007f41041ad4d7 in g_malloc0 () from /usr/lib64/libglib-2.0.so.0
#3 0x00007f41041a5437 in g_source_new () from /usr/lib64/libglib-2.0.so.0 #4 0x00007f410a2cec9c in qio_channel_create_fd_watch (address@hidden, fd=20, address@hidden (G_IO_IN | G_IO_ERR | G_IO_HUP | G_IO_NVAL)) at io/channel-watch.c:259 #5 0x00007f410a2ced01 in qio_channel_create_socket_watch (address@hidden, socket=<optimized out>, address@hidden(G_IO_IN | G_IO_ERR | G_IO_HUP | G_IO_NVAL)) at io/channel-watch.c:311 #6 0x00007f410a2cbea7 in qio_channel_socket_create_watch (ioc=0x7f410d6238c0, condition=(G_IO_IN | G_IO_ERR | G_IO_HUP | G_IO_NVAL))
    at io/channel-socket.c:732
#7 0x00007f410a2c94d2 in qio_channel_create_watch (ioc=0x7f410d6238c0, address@hidden
    (G_IO_IN | G_IO_ERR | G_IO_HUP | G_IO_NVAL)) at io/channel.c:132
#8 0x00007f410a003cd6 in io_watch_poll_prepare (source=0x7f4070000d00, timeout_=<optimized out>) at qemu-char.c:883 #9 0x00007f41041a72ed in g_main_context_prepare () from /usr/lib64/libglib-2.0.so.0 #10 0x00007f41041a7b7b in g_main_context_iterate.isra.24 () from /usr/lib64/libglib-2.0.so.0 #11 0x00007f41041a7fba in g_main_loop_run () from /usr/lib64/libglib-2.0.so.0 #12 0x00007f410a1e528f in colo_compare_thread (opaque=0x7f410d7d6800) at net/colo-compare.c:651
#13 0x00007f4103b2bdc5 in start_thread () from /usr/lib64/libpthread.so.0
#14 0x00007f410385971d in clone () from /usr/lib64/libc.so.6

It looks like we use main context which is wrong, maybe you can track io_add_watch_poll() and its caller get the reason.

Thanks



reply via email to

[Prev in Thread] Current Thread [Next in Thread]