qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2] virtio-pci: disable vring processing when bus-mastering i


From: Halil Pasic
Subject: Re: [PATCH v2] virtio-pci: disable vring processing when bus-mastering is disabled
Date: Thu, 28 Nov 2019 17:48:00 +0100

On Tue, 19 Nov 2019 18:50:03 -0600
Michael Roth <address@hidden> wrote:

[..]
> I.e. the calling code is only scheduling a one-shot BH for
> virtio_blk_data_plane_stop_bh, but somehow we end up trying to process
> an additional virtqueue entry before we get there. This is likely due
> to the following check in virtio_queue_host_notifier_aio_poll:
> 
>   static bool virtio_queue_host_notifier_aio_poll(void *opaque)
>   {
>       EventNotifier *n = opaque;
>       VirtQueue *vq = container_of(n, VirtQueue, host_notifier);
>       bool progress;
> 
>       if (!vq->vring.desc || virtio_queue_empty(vq)) {
>           return false;
>       }
> 
>       progress = virtio_queue_notify_aio_vq(vq);
> 
> namely the call to virtio_queue_empty(). In this case, since no new
> requests have actually been issued, shadow_avail_idx == last_avail_idx,
> so we actually try to access the vring via vring_avail_idx() to get
> the latest non-shadowed idx:
> 
>   int virtio_queue_empty(VirtQueue *vq)
>   {
>       bool empty;
>       ...
> 
>       if (vq->shadow_avail_idx != vq->last_avail_idx) {
>           return 0;
>       }
> 
>       rcu_read_lock();
>       empty = vring_avail_idx(vq) == vq->last_avail_idx;
>       rcu_read_unlock();
>       return empty;
> 
> but since the IOMMU region has been disabled we get a bogus value (0
> usually), which causes virtio_queue_empty() to falsely report that
> there are entries to be processed, which causes errors such as:
> 
>   "virtio: zero sized buffers are not allowed"
> 
> or
> 
>   "virtio-blk missing headers"
> 
> and puts the device in an error state.
> 

I've seen something similar on s390x with virtio-ccw-blk under
protected virtualization, that made me wonder about how virtio-blk in
particular but also virtio in general handles shutdown and reset.

This makes me wonder if bus-mastering disabled is the only scenario when
a something like vdev->disabled should be used.

In particular I have the following mechanism in mind 

qemu_system_reset() --> ... --> qemu_devices_reset() --> ... --> 
--> virtio_[transport]_reset() --> ... --> virtio_bus_stop_ioeventfd()
--> virtio_blk_data_plane_stop()

which in turn triggesrs the following cascade:
virtio_blk_data_plane_stop_bh --> virtio_queue_aio_set_host_notifier_handler() 
-->
--> virtio_queue_host_notifier_aio_read() 
which however calls 
virtio_queue_notify_aio_vq() if the notifier tests as
positive. 

Since we still have vq->handle_aio_output that means we may
call virtqueue_pop() during the reset procedure.

This was a problem for us, because (due to a bug) the shared pages that
constitute the virtio ring weren't shared any more. And thus we got
the infamous  
virtio_error(vdev, "virtio: zero sized buffers are not allowed").

Now the bug is no more, and we can tolerate that somewhat late access
to the virtio ring.

But it keeps nagging me, is it really OK for the device to access the
virtio ring during reset? My intuition tells me that the device should
not look for new requests after it has been told to reset.

Opinions? (Michael, Connie)

Regards,
Halil

> This patch works around the issue by introducing virtio_set_disabled(),
> which sets a 'disabled' flag to bypass checks like virtio_queue_empty()
> when bus-mastering is disabled. Since we'd check this flag at all the
> same sites as vdev->broken, we replace those checks with an inline
> function which checks for either vdev->broken or vdev->disabled.
> 
> The 'disabled' flag is only migrated when set, which should be fairly
> rare, but to maintain migration compatibility we disable it's use for
> older machine types. Users requiring the use of the flag in conjunction
> with older machine types can set it explicitly as a virtio-device
> option.
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]