qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 0/3] Early release of -drive QemuOpts


From: Markus Armbruster
Subject: Re: [Qemu-devel] [PATCH 0/3] Early release of -drive QemuOpts
Date: Thu, 17 Mar 2016 18:00:42 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux)

Paolo Bonzini <address@hidden> writes:

> In short, this patch gets rid of blockdev_mark_auto_del and
> blockdev_auto_del.
>
> With these patches, it is possible to create a new -drive with the same
> id as soon as the DEVICE_DELETED event is delivered (which equals to
> unrealize).
>
> I'm sorry I'm not able to explain the history (and probably do not
> understand the full ramifications) of this.

I'm going to try, because I need it myself to understand the
ramifications.

>                                              That's why this is just
> an RFC.

It's technically not an RFC anymore.

> The idea here is that reference counting the BlockBackend is enough to
> defer the deletion of the block device as much as necessary; anticipating
> the destruction of the DriveInfo is not a problem, and has the desired
> effect of freeing the QemuOpts.

Let me explain how we got ourselves into this mess, how it works, and
where it fails.

In the beginning, block frontends and backends got created during
startup and lived as long as the process.  Life was simple, if a bit
dull.

Enter hot plug / unplug, commit 6f338c3 "qemu: PCI device, disk and host
network hot-add / hot-remove (Marcelo Tosatti)", Feb 2009, v0.10.  In
this early form, the interface mushed frontends and backends together,
similar to the command line at that time.  The interface we use today
appeared in commit 3418bd2 "qdev hotplug: infrastructure and monitor
commands.", Sep 2009, v0.12.  Here's how it works:

* Dynamically create a block backend:

    drive_add "" if=none,id=BE-ID,...

* Hot plug a block frontend:

    device_add driver=DRV,id=FE-ID,drive=BE-ID,...

* Hot unplug a block frontend

    device_del FE-ID

  For virtio-blk and SCSI devices, this also destroys the block backend.
  Back then, these were the only devices that could be unplugged, and
  this was the only way to destroy a block backend.  This automatic
  backend destruction was a mistake.  No other kind of backend gets
  destroyed that way.  The (still experimental) blockdev-add gives us
  the opportunity to correct the mistake: backends created with it are
  not destroyed by frontend unplug.

  For some kinds of devices, such as USB, hot unplug is synchronous.
  For others, such as PCI, it's not: device_del merely initiates the
  unplug, which then completes in its own sweet time.  Or doesn't: PCI
  unplug via ACPI actually requires guest cooperation.

  While there's some complexity in orchestrating the PCI-ACPI-dance, the
  block code proper is still simple: frontend destruction triggers
  backend destruction.

Not only is automatic backend destruction a mistake, it's also not quite
trivial.  The backend gets destroyed by the frontend's qdev exit()
method (which has since become its QOM unrealize()).  This leaves the
frontend's pointer to the backend dangling until the frontend finishes
dying.  Works as long as it doesn't get dereferenced during that time,
but relying on that is a bit brittle.  When it got in the way, it led to
commit 14bafc5 "blockdev: Clean up automatic drive deletion", Jun 2010,
v0.13.  This is where blockdev_mark_auto_del() and blockdev_auto_del()
come from.  Instead of destroying the backend, we merely mark it, and
destroy it when it's safe.

The next piece of the puzzle is drive_del.  While an image is in use by
QEMU, it's generally unsafe to use by anything else.  If you need it for
something else, you need to first make QEMU relinquish it.  Easy enough:
hot unplug the frontend, thus destroy the backend, done.

Except when the unplug needs guest cooperation, and the guest doesn't
cooperate.  This motivated commit 9063f81 "Implement drive_del to
decouple block removal from device removal", Nov 2010, v0.14.  The guest
gets no say, and afterwards sees a terminally broken block device.

In its initial form, drive_del simply closed and destroyed the block
backend right away.  It tried to zap the pointers first, but failed.  We
fixed the resulting use-after-free in commit d22b2f4 "Do not delete
BlockDriverState when deleting the drive", Mar 2011, v0.15: we hide
instead of destroy the closed backend when it's being used by a
frontend.

Why hide it?  Backward compatibility.  v0.14's drive_del made the
backend's ID available for reuse immediately.  What we can't destroy
immediately, we need to hide.

The final piece of the puzzle is event DEVICE_DELETED, from commit
0402a5d "qdev: DEVICE_DELETED event", Mar 2013, v1.5.0.  It gets emitted
when unplug completes.

Meanwhile and mostly independently, backend reference counting evolved.
When backends still lived as long as the process, there was none,
obviously.

We probably should've introduced it for drive_del in v0.14, but that
didn't happen, because we (incorrectly) thought drive_del could simply
delete and be done.

Instead, it got introduced for block migration, in commit 84fb392
"blockdev: add refcount to DriveInfo", Jan 2011, v0.15.

Note that DriveInfo is *not* the block backend.  It captures a -drive /
drive_add.  Reference counting block backend there pressed DriveInfo to
use as owner of the block backend.  It took a while to clean that up,
until commit fa510eb "block: use BDS ref for block jobs", Aug 2013,
v1.7.  DriveInfo's reference count even lingered until commit ae60e8e,
v2.1.

Note that reference counting began only after the unplug work was done
except for the bug fixes.

When I worked on BlockBackend, I wanted to do what your series does: put
the reference counting to use to clean up the unplug mess.  But then the
BlockBackend job became too complex and I had to cut that part.

So, what do we learn from my lengthy history lesson?  I think we can
learn what to watch out for.  Here's my list:

* Block backend auto-destroy wart: we need to destroy block backends on
  frontend unplug exactly as before.

* The "drive_del hides the backend" wart: what we can't destroy
  immediately, we need to hide.

* drive_del is where dragons be.  I've fixed enough bugs there, and
  spent enough hours wracking my brain how to change things without
  messing it up to be on my toes there.

> Patches 1 and 3 are mostly similar to the version I had earlier sent as
> RFC, but they now pass all unit tests.  Patch 2 is new, but I don't know
> of a test that fails it.

I'll review the actual patches next, hopefully tomorrow.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]