qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] race between tb_gen_code() and qemu_ram_free()


From: Paolo Bonzini
Subject: Re: [Qemu-devel] race between tb_gen_code() and qemu_ram_free()
Date: Fri, 20 Apr 2018 20:20:05 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0

On 20/04/2018 19:55, Peter Maydell wrote:
> There seems to be a race between tb_gen_code() and qemu_ram_free(),
> which results in an abort() in Edgar's test case that exercises the
> xilinx-spips mmio-exec functionality.
> 
> Here's what happens:
>  (1) memory_region_invalidate_mmio_ptr() is called, and it deletes
                                                          ^^

This "it" is more precisely memory_region_do_invalidate_mmio_ptr.

>  the temporary ram MemoryRegion. This doesn't immediately result in
>  deletion of the RAMBlock (which is done from the RCU thread later)
>  (2) we try to execute from this physaddr, so tb_gen_code() is called.
>  This calls get_page_addr_code(). Since the RAMBlock hasn't yet
>  really been deleted, get_page_addr_code() finds it, and returns a
>  ram_addr in the about-to-go-away memory...
>  (3) while the CPU thread is busy doing codegen, the RCU thread runs.
>  It does a flatview_destroy() which ends up doing an object_unref on the
>  memory region, whose destructor calls qemu_ram_free(), removing the
>  RAMBlock from the ram list
>  (4) when the CPU thread has done its codegen, it calls tb_link_page,
>  which calls tb_alloc_page, which calls tlb_protect_code(), with the
>  ram_addr that it got in step 2. We end up in tlb_reset_dirty_range_all(),
>  which calls qemu_get_ram_block() on that ram_addr.
>  (5) qemu_get_ram_block() aborts, because it can't find the RAMBlock
>  corresponding to the ram_addr (because we removed it in step 3).
> 
> How should we resolve this race ?

Note that qemu_ram_free() is _also_ RCU-freeing the RAMBlock.  If it is
not found, it means that codegen is not running within
rcu_read_lock()/rcu_read_unlock().  In fact it's not.

So one possibility would be to add it there.  Theoretically the simplest
possibility, but rcu_read_lock() is taken _outside_ tb_lock which may be
a problem.

The alternative is to add a ref/unref pair of the MemoryRegion whenever
get_page_addr_code() uses it.  That would delay the MemoryRegion
destructor after the end of code generation.  A rough sketch would be
that get_page_addr_code() would grow a MemoryRegion** argument and would
use memory_region_from_host() instead of qemu_ram_addr_from_host().
get_page_addr_code() would either call rcu_read_lock(), or the callers
would (see the doc comment for memory_region_from_host(); it always
feels good when you can refer to doc comments!)

Thanks,

Paolo

> (PS: rr is really useful for identifying what's actually happening
> in multi-thread races like this...)
> 
> thanks
> -- PMM
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]