[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2 12/17] mirror: Fix potential use-after-free i

From: Max Reitz
Subject: Re: [Qemu-devel] [PATCH v2 12/17] mirror: Fix potential use-after-free in active commit
Date: Tue, 18 Sep 2018 16:11:05 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.0

On 17.09.18 13:37, Kevin Wolf wrote:
> Am 17.09.2018 um 00:05 hat Max Reitz geschrieben:
>> On 14.09.18 18:25, Kevin Wolf wrote:
>>> Am 13.09.2018 um 22:55 hat Max Reitz geschrieben:
>>>> On 13.09.18 14:52, Kevin Wolf wrote:
>>>>> When starting an active commit job, other callbacks can run before
>>>>> mirror_start_job() calls bdrv_ref() where needed and cause the nodes to
>>>>> go away. Add another pair of bdrv_ref/unref() around it to protect
>>>>> against this case.
>>>>> Signed-off-by: Kevin Wolf <address@hidden>
>>>>> ---
>>>>>  block/mirror.c | 11 +++++++++++
>>>>>  1 file changed, 11 insertions(+)
>>>> Reviewed-by: Max Reitz <address@hidden>
>>>> But...  How?
>>> Tried to reproduce the other bug Paolo was concerned about (good there
>>> is something like 'git reflog'!) and dug up this one instead.
>>> So the scenario seems to be test_stream_commit_1 in qemu-iotests 030.
>>> The backtrace when the BDS is deleted is the following:
>>> (rr) bt
>>> #0  0x00007faeaf6145ec in __memset_sse2_unaligned_erms () at 
>>> /lib64/libc.so.6
>>> #1  0x00007faeaf60414e in _int_free () at /lib64/libc.so.6
>>> #2  0x00007faeaf6093be in free () at /lib64/libc.so.6
>>> #3  0x00007faecaea6b4e in g_free () at /lib64/libglib-2.0.so.0
>>> #4  0x000055c0c94f9d6c in bdrv_delete (bs=0x55c0cbe69240) at block.c:3590
>>> #5  0x000055c0c94fbe82 in bdrv_unref (bs=0x55c0cbe69240) at block.c:4638
>>> #6  0x000055c0c94f6761 in bdrv_root_unref_child (child=0x55c0cbe57af0) at 
>>> block.c:2188
>>> #7  0x000055c0c94fe239 in block_job_remove_all_bdrv (job=0x55c0ccf9e220) at 
>>> blockjob.c:200
>>> #8  0x000055c0c94fdf1f in block_job_free (job=0x55c0ccf9e220) at 
>>> blockjob.c:94
>>> #9  0x000055c0c94ffe0d in job_unref (job=0x55c0ccf9e220) at job.c:368
>>> #10 0x000055c0c95007cd in job_do_dismiss (job=0x55c0ccf9e220) at job.c:641
>>> #11 0x000055c0c95008d9 in job_conclude (job=0x55c0ccf9e220) at job.c:667
>>> #12 0x000055c0c9500b66 in job_finalize_single (job=0x55c0ccf9e220) at 
>>> job.c:735
>>> #13 0x000055c0c94ff4fa in job_txn_apply (txn=0x55c0cbe19eb0, 
>>> fn=0x55c0c9500a62 <job_finalize_single>, lock=true) at job.c:151
>>> #14 0x000055c0c9500e92 in job_do_finalize (job=0x55c0ccf9e220) at job.c:822
>>> #15 0x000055c0c9501040 in job_completed_txn_success (job=0x55c0ccf9e220) at 
>>> job.c:872
>>> #16 0x000055c0c950115c in job_completed (job=0x55c0ccf9e220, ret=0, 
>>> error=0x0) at job.c:892
>>> #17 0x000055c0c92b572c in stream_complete (job=0x55c0ccf9e220, 
>>> opaque=0x55c0cc050bc0) at block/stream.c:96
>> But isn't this just deletion of node1, i.e. the intermediate node of the
>> stream job?
>> The commit job happens above that (node3 and upwards), so all its BDSs
>> shouldn't be affected.  So even with the bdrv_ref()s introduced in this
>> patch, I'd still expect node1 to be deleted exactly here.
> Hm, I suppose that's true.
> So it crashed because the drain in bdrv_reopen() didn't actually drain
> the streaming job (which includes removing node1), so node1 ended up in
> the ReopenQueue and when it disappeared in the middle of reopen, we got
> a use after free.
> Then it would be bug between job draining and bdrv_reopen() and the
> active commit job would only be a victim of the bug without actually
> doing anything wrong, and we can drop this patch.
> Does this make sense?

The explanation makes sense, yes.

About dropping the patch...  We would have to be certain that you can
only run a block job if none of the nodes involved can be deleted at any
point.  After having started the job, this is the job's responsibility
by having referenced everything it needs (through block_job_add_bdrv()).
 So the critical point is when the job is starting, before it invoked
that function.

The user cannot do any graph manipulation then because the monitor is
blocked by starting the job.  So the only issue would be other
operations that can make nodes go away without user intervention.
Naturally the way to prevent conflicts would be with blockers, and
currently our old job blockers should be sufficient to prevent them.
But in theory we don't like them, so is this what we need the graph
manipulation blocker for? :-)


Attachment: signature.asc
Description: OpenPGP digital signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]