qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to


From: Paolo Bonzini
Subject: Re: [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize
Date: Wed, 18 Sep 2013 00:02:51 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130805 Thunderbird/17.0.8

Il 17/09/2013 21:51, Michael S. Tsirkin ha scritto:
> A much more interesting case is e.g. disabling memory.
> E.g.
> 
> config write (disable memory)
> read (flush out outstanding writes)
> write <- must now have no effect

This works already.  memory_region_del_subregion is synchronous, and
will remain synchronous wrt the current CPU even after memory dispatch
is moved out of the BQL.

This is racy of course:

    VCPU 1                               VCPU 2
    ------------------------------------------------------------
    config write (disable memory)        write
    read

This is also racy:

    VCPU 1                               DMA to MMIO region
    ------------------------------------------------------------
    config write (disable memory)        write
    read

This is the case where a write from another device can do weird things
such as causing a packet to be transmitted on a NIC.  As you wrote (and
I agree) device removal is a superset of disabling memory and bus
master; ergo, if we handle it for disabling memory we need to handle it
for device removal too.  This is why I believe qemu_del_nic belongs in
instance_finalize.

> Or disabling bus master:
> 
> config write (disable bus master)
> read (flush in outstanding writes)
>  <- device must now not change memory

This doesn't work.  The problem is that when you disable bus master any
previous call to address_space_map remains mapped.  Whoever called
address_space_map can and will write blindly to that area.

This cannot be fixed by synchronize_rcu() no matter where you place it.
 The logic is like this

   address_space_map
    rcu_read_lock()
    mr = find memory region for address
    memory_region_ref(mr)
    rcu_read_unlock()
   do actual read or write
   address_space_unmap
    memory_region_unref(mr)

synchronize_rcu() ensures that future writes will not write to memory.
But it does not ensure anything about writes started before.  And
unfortunately a write "starts" at the moment you start an AIO operation,
and lasts until roughly when the AIO callback executes.

If you want to fix that, the right place to do it is the PCI core.  You
need a new callback of some sort in the PCI device, that will
synchronously cancel all pending I/O when the bus master bit is set to zero.

I don't think address_space_map/unmap has any equivalent PCI transaction
(or exists in any other sane bus, for that matter).  However, we do it
pervasively for obvious performance reasons.  I'm afraid that this is an
emulation tradeoff that we cannot avoid.

> And these rules absolutely must be obeyed,
> if you don't you'll break guests.

Yes (and I think it was already discussed, I have some deja vu feeling).
 I'm not sure it is _that_ important (QEMU lived with a no-op bus master
bit for years and nothing exploded), but it is certainly good to know
what works and what doesn't---and why, and whether it is fixable.

> So I think we really should find a way to make the above work correctly.
> Removal will follow almost automatically, since it disables memory and
> mastering by itself.

These patches are concerned with the "almost" part.  The difference
between config writes and removal is that, as the code is currently
written, config writes cannot cause dangling pointers.

Paolo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]