qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [PATCH v2 12/17] mirror: Fix potential use-after-free i


From: Max Reitz
Subject: Re: [Qemu-block] [PATCH v2 12/17] mirror: Fix potential use-after-free in active commit
Date: Mon, 17 Sep 2018 00:05:57 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.0

On 14.09.18 18:25, Kevin Wolf wrote:
> Am 13.09.2018 um 22:55 hat Max Reitz geschrieben:
>> On 13.09.18 14:52, Kevin Wolf wrote:
>>> When starting an active commit job, other callbacks can run before
>>> mirror_start_job() calls bdrv_ref() where needed and cause the nodes to
>>> go away. Add another pair of bdrv_ref/unref() around it to protect
>>> against this case.
>>>
>>> Signed-off-by: Kevin Wolf <address@hidden>
>>> ---
>>>  block/mirror.c | 11 +++++++++++
>>>  1 file changed, 11 insertions(+)
>>
>> Reviewed-by: Max Reitz <address@hidden>
>>
>> But...  How?
> 
> Tried to reproduce the other bug Paolo was concerned about (good there
> is something like 'git reflog'!) and dug up this one instead.
> 
> So the scenario seems to be test_stream_commit_1 in qemu-iotests 030.
> 
> The backtrace when the BDS is deleted is the following:
> 
> (rr) bt
> #0  0x00007faeaf6145ec in __memset_sse2_unaligned_erms () at /lib64/libc.so.6
> #1  0x00007faeaf60414e in _int_free () at /lib64/libc.so.6
> #2  0x00007faeaf6093be in free () at /lib64/libc.so.6
> #3  0x00007faecaea6b4e in g_free () at /lib64/libglib-2.0.so.0
> #4  0x000055c0c94f9d6c in bdrv_delete (bs=0x55c0cbe69240) at block.c:3590
> #5  0x000055c0c94fbe82 in bdrv_unref (bs=0x55c0cbe69240) at block.c:4638
> #6  0x000055c0c94f6761 in bdrv_root_unref_child (child=0x55c0cbe57af0) at 
> block.c:2188
> #7  0x000055c0c94fe239 in block_job_remove_all_bdrv (job=0x55c0ccf9e220) at 
> blockjob.c:200
> #8  0x000055c0c94fdf1f in block_job_free (job=0x55c0ccf9e220) at blockjob.c:94
> #9  0x000055c0c94ffe0d in job_unref (job=0x55c0ccf9e220) at job.c:368
> #10 0x000055c0c95007cd in job_do_dismiss (job=0x55c0ccf9e220) at job.c:641
> #11 0x000055c0c95008d9 in job_conclude (job=0x55c0ccf9e220) at job.c:667
> #12 0x000055c0c9500b66 in job_finalize_single (job=0x55c0ccf9e220) at 
> job.c:735
> #13 0x000055c0c94ff4fa in job_txn_apply (txn=0x55c0cbe19eb0, 
> fn=0x55c0c9500a62 <job_finalize_single>, lock=true) at job.c:151
> #14 0x000055c0c9500e92 in job_do_finalize (job=0x55c0ccf9e220) at job.c:822
> #15 0x000055c0c9501040 in job_completed_txn_success (job=0x55c0ccf9e220) at 
> job.c:872
> #16 0x000055c0c950115c in job_completed (job=0x55c0ccf9e220, ret=0, 
> error=0x0) at job.c:892
> #17 0x000055c0c92b572c in stream_complete (job=0x55c0ccf9e220, 
> opaque=0x55c0cc050bc0) at block/stream.c:96

But isn't this just deletion of node1, i.e. the intermediate node of the
stream job?

The commit job happens above that (node3 and upwards), so all its BDSs
shouldn't be affected.  So even with the bdrv_ref()s introduced in this
patch, I'd still expect node1 to be deleted exactly here.

Max

> #18 0x000055c0c950142b in job_defer_to_main_loop_bh (opaque=0x55c0cbe38a50) 
> at job.c:981
> #19 0x000055c0c96171fc in aio_bh_call (bh=0x55c0cbe19f20) at util/async.c:90
> #20 0x000055c0c9617294 in aio_bh_poll (ctx=0x55c0cbd7b8d0) at util/async.c:118
> #21 0x000055c0c961c150 in aio_poll (ctx=0x55c0cbd7b8d0, blocking=true) at 
> util/aio-posix.c:690
> #22 0x000055c0c957246c in bdrv_flush (bs=0x55c0cbe4b250) at block/io.c:2693
> #23 0x000055c0c94f8e46 in bdrv_reopen_prepare (reopen_state=0x55c0cbef4d68, 
> queue=0x55c0cbeab9f0, errp=0x7ffd65617050) at block.c:3206
> #24 0x000055c0c94f883a in bdrv_reopen_multiple (ctx=0x55c0cbd7b8d0, 
> bs_queue=0x55c0cbeab9f0, errp=0x7ffd656170c8) at block.c:3032
> #25 0x000055c0c94f8a00 in bdrv_reopen (bs=0x55c0cbe2c250, bdrv_flags=8194, 
> errp=0x7ffd65617220) at block.c:3075
> #26 0x000055c0c956a23f in commit_active_start (job_id=0x0, bs=0x55c0cbd905b0, 
> base=0x55c0cbe2c250, creation_flags=0, speed=0, 
> on_error=BLOCKDEV_ON_ERROR_REPORT, filter_node_name=0x0, cb=0x0, opaque=0x0, 
> auto_complete=false, errp=0x7ffd65617220) at block/mirror.c:1687
> #27 0x000055c0c927437c in qmp_block_commit (has_job_id=false, job_id=0x0, 
> device=0x55c0cbef4d20 "drive0", has_base_node=false, base_node=0x0, 
> has_base=true, base=0x55c0cbe38a00 
> "/home/kwolf/source/qemu/tests/qemu-iotests/scratch/img-3.img", 
> has_top_node=false, top_node=0x0, has_top=false, top=0x0, 
> has_backing_file=false, backing_file=0x0, has_speed=false, speed=0, 
> has_filter_node_name=false, filter_node_name=0x0, errp=0x7ffd656172d8) at 
> blockdev.c:3325
> #28 0x000055c0c928bb08 in qmp_marshal_block_commit (args=<optimized out>, 
> ret=<optimized out>, errp=0x7ffd656173b8) at 
> qapi/qapi-commands-block-core.c:409
> #29 0x000055c0c960beef in do_qmp_dispatch (errp=0x7ffd656173b0, 
> allow_oob=false, request=<optimized out>, cmds=0x55c0c9f56f80 <qmp_commands>) 
> at qapi/qmp-dispatch.c:129
> #30 0x000055c0c960beef in qmp_dispatch (cmds=0x55c0c9f56f80 <qmp_commands>, 
> request=<optimized out>, allow_oob=<optimized out>) at qapi/qmp-dispatch.c:171
> #31 0x000055c0c9111623 in monitor_qmp_dispatch (mon=0x55c0cbdbb930, 
> req=0x55c0ccf90e00, id=0x0) at /home/kwolf/source/qemu/monitor.c:4168
> #32 0x000055c0c911191f in monitor_qmp_bh_dispatcher (data=0x0) at 
> /home/kwolf/source/qemu/monitor.c:4237
> #33 0x000055c0c96171fc in aio_bh_call (bh=0x55c0cbccc840) at util/async.c:90
> #34 0x000055c0c9617294 in aio_bh_poll (ctx=0x55c0cbccc550) at util/async.c:118
> #35 0x000055c0c961b723 in aio_dispatch (ctx=0x55c0cbccc550) at 
> util/aio-posix.c:436
> #36 0x000055c0c961762f in aio_ctx_dispatch (source=0x55c0cbccc550, 
> callback=0x0, user_data=0x0) at util/async.c:261
> #37 0x00007faecaea1257 in g_main_context_dispatch () at 
> /lib64/libglib-2.0.so.0
> #38 0x000055c0c961a2aa in glib_pollfds_poll () at util/main-loop.c:215
> #39 0x000055c0c961a2aa in os_host_main_loop_wait (timeout=0) at 
> util/main-loop.c:238
> #40 0x000055c0c961a2aa in main_loop_wait (nonblocking=<optimized out>) at 
> util/main-loop.c:497
> #41 0x000055c0c92805e1 in main_loop () at vl.c:1866
> #42 0x000055c0c9287eca in main (argc=18, argv=0x7ffd65617958, 
> envp=0x7ffd656179f0) at vl.c:4647
> (rr) p bs.node_name 
> $17 = "node1", '\000' <repeats 26 times>
> 
> Obviously, this exact backtrace shouldn't even be possible like this
> because job_defer_to_main_loop_bh() shouldn't be called inside
> bdrv_flush(), but when draining the subtree in bdrv_reopen(). This is
> only fixed in "blockjob: Lie better in child_job_drained_poll()".
> 
> It probably doesn't make a difference, though, where exactly during
> bdrv_reopen() the node is deleted. We'll crash anyway if we don't hold
> the extra reference.
> 
> Kevin
> 


Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]