qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to


From: Michael S. Tsirkin
Subject: Re: [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize
Date: Wed, 18 Sep 2013 08:48:48 +0300

On Wed, Sep 18, 2013 at 12:02:51AM +0200, Paolo Bonzini wrote:
> Il 17/09/2013 21:51, Michael S. Tsirkin ha scritto:
> > A much more interesting case is e.g. disabling memory.
> > E.g.
> > 
> > config write (disable memory)
> > read (flush out outstanding writes)
> > write <- must now have no effect
> 
> This works already.  memory_region_del_subregion is synchronous, and
> will remain synchronous wrt the current CPU even after memory dispatch
> is moved out of the BQL.
> 
> This is racy of course:
> 
>     VCPU 1                               VCPU 2
>     ------------------------------------------------------------
>     config write (disable memory)        write
>     read
> 
> This is also racy:
> 
>     VCPU 1                               DMA to MMIO region


What about normal DMA to memory?
I don't believe DMA to MMIO is relevant ATM,
we can just make it fail (e.g. as Express spec 1.0 allowed).

>     ------------------------------------------------------------
>     config write (disable memory)        write
>     read
> 
> This is the case where a write from another device can do weird things
> such as causing a packet to be transmitted on a NIC.  As you wrote (and
> I agree) device removal is a superset of disabling memory and bus
> master; ergo, if we handle it for disabling memory we need to handle it
> for device removal too.  This is why I believe qemu_del_nic belongs in
> instance_finalize.
>
> > Or disabling bus master:
> > 
> > config write (disable bus master)
> > read (flush in outstanding writes)
> >  <- device must now not change memory
> 
> This doesn't work.  The problem is that when you disable bus master any
> previous call to address_space_map remains mapped.  Whoever called
> address_space_map can and will write blindly to that area.
> 
> This cannot be fixed by synchronize_rcu() no matter where you place it.
>  The logic is like this
> 
>    address_space_map
>     rcu_read_lock()
>     mr = find memory region for address
>     memory_region_ref(mr)
>     rcu_read_unlock()
>    do actual read or write
>    address_space_unmap
>     memory_region_unref(mr)
> 
> synchronize_rcu() ensures that future writes will not write to memory.
> But it does not ensure anything about writes started before.  And
> unfortunately a write "starts" at the moment you start an AIO operation,
> and lasts until roughly when the AIO callback executes.
> 
> If you want to fix that, the right place to do it is the PCI core.  You
> need a new callback of some sort in the PCI device, that will
> synchronously cancel all pending I/O when the bus master bit is set to zero.


It has to work even without config changes really.

So I think the fix is actually obeying ordering rules,
that is know that write is in progress
and flush on read.

> I don't think address_space_map/unmap has any equivalent PCI transaction
> (or exists in any other sane bus, for that matter).  However, we do it
> pervasively for obvious performance reasons.  I'm afraid that this is an
> emulation tradeoff that we cannot avoid.
> 
> > And these rules absolutely must be obeyed,
> > if you don't you'll break guests.
> 
> Yes (and I think it was already discussed, I have some deja vu feeling).
>  I'm not sure it is _that_ important (QEMU lived with a no-op bus master
> bit for years and nothing exploded),

Yes but you can't pass e.g. WHQL without it.
And even bus master is less important than obeying ordering rules.
We got and get away with ignoring them because of the BQL.
The moment emulated PCI devices write into memory without a BQL
we'll need to emulate ordering.

> but it is certainly good to know
> what works and what doesn't---and why, and whether it is fixable.
> 
> > So I think we really should find a way to make the above work correctly.
> > Removal will follow almost automatically, since it disables memory and
> > mastering by itself.
> 
> These patches are concerned with the "almost" part.  The difference
> between config writes and removal is that, as the code is currently
> written, config writes cannot cause dangling pointers.
> 
> Paolo

I'm not really objecting to the patches.

I think moving memory region destroy out to finalize makes sense
irrespectively, as long as destroy is made idempotent so we can simply
destroy everything without worrying whether we initialized it.

The rest of the changes will be harder, we'll have to do
them carefully on a case by case basis.

-- 
MST



reply via email to

[Prev in Thread] Current Thread [Next in Thread]