[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v3 0/7] Fix scsi devices plug/unplug races w.r.t virtio-scsi
Re: [PATCH v3 0/7] Fix scsi devices plug/unplug races w.r.t virtio-scsi iothread
Thu, 13 Aug 2020 15:26:21 +0300
Evolution 3.36.3 (3.36.3-1.fc32)
On Wed, 2020-07-15 at 18:01 +0300, Maxim Levitsky wrote:
> This is a patch series that is a result of my discussion with Paulo on
> how to correctly fix the root cause of the BZ #1812399.
> The root cause of this bug is the fact that IO thread is running mostly
> unlocked versus main thread on which device hotplug is done.
> qdev_device_add first creates the device object, then places it on the bus,
> and only then realizes it.
> However some drivers and currently only virtio-scsi enumerate its child bus
> devices on each request that is received from the guest and that can happen
> on the IO
> Thus we have a window when new device is on the bus but not realized and can
> be accessed
> by the virtio-scsi driver in that state.
> Fix that by doing two things:
> 1. Add partial RCU protection to the list of a bus's child devices.
> This allows the scsi IO thread to safely enumerate the child devices
> while it races with the hotplug placing the device on the bus.
> 2. Make the virtio-scsi driver check .realized property of the scsi device
> and avoid touching the device if it isn't
> Note that in the particular bug report the issue wasn't a race but rather due
> to combination of things, the .realize code in the middle managed to trigger
> IO on the virtqueue
> which caused the virtio-scsi driver to access the half realized device.
> since this can happen as well with real IO thread, this patch series was done,
> which fixes this as well.
> Changes from V1:
> * Patch 2 is new, as suggested by Stefan, added drain_call_rcu() to fix the
> failing unit test,
> make check pass now
> * Patches 6,7 are new as well: I added scsi_device_get as suggested by
> Stefan as well, although
> this is more a refactoring that anything else as it doesn't solve
> an existing race.
> * Addressed most of the review feedback from V1
> - still need to decide if we need QTAILQ_FOREACH_WITH_RCU_READ_LOCK
> Changes from V2:
> * No longer RFC
> * Addressed most of the feedback from Stefan
> * Fixed reference count leak in patch 7 when device is about to be
> * Better testing
> This series was tested by adding a virtio-scsi drive with iothread,
> then running fio stress job in the guest in a loop, and then adding/removing
> the scsi drive on the host in the loop.
> This test was failing usually on 1st iteration withouth this patch series,
> and now it seems to work smoothly.
> Best regards,
> Maxim Levitsky
> Maxim Levitsky (7):
> scsi/scsi_bus: switch search direction in scsi_device_find
> Implement drain_call_rcu and use it in hmp_device_del
> device-core: use RCU for list of childs of a bus
> device-core: use atomic_set on .realized property
> virtio-scsi: don't touch scsi devices that are not yet realized or
> about to be un-realized
> scsi: Add scsi_device_get
> virtio-scsi: use scsi_device_get
> hw/core/bus.c | 28 +++++++++++++--------
> hw/core/qdev.c | 56 +++++++++++++++++++++++++++++++-----------
> hw/scsi/scsi-bus.c | 48 +++++++++++++++++++++++++++++++-----
> hw/scsi/virtio-scsi.c | 47 ++++++++++++++++++++++++++++-------
> include/hw/qdev-core.h | 11 +++++++++
> include/hw/scsi/scsi.h | 2 ++
> include/qemu/rcu.h | 1 +
> qdev-monitor.c | 22 +++++++++++++++++
> util/rcu.c | 55 +++++++++++++++++++++++++++++++++++++++++
> 9 files changed, 230 insertions(+), 40 deletions(-)
Seems that this patch series doesn't have any disagreeements, so
anybody wants to put it on an maintainer's tree, now that the freeze is over?