[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 1/1] virtio: fail device if set_event_notifier f

From: Cornelia Huck
Subject: Re: [Qemu-devel] [PATCH 1/1] virtio: fail device if set_event_notifier fails
Date: Mon, 6 Mar 2017 17:04:41 +0100

On Mon, 6 Mar 2017 16:21:13 +0100
Halil Pasic <address@hidden> wrote:

> On 03/06/2017 03:56 PM, Cornelia Huck wrote:
> > On Fri, 3 Mar 2017 14:08:37 +0100
> > Halil Pasic <address@hidden> wrote:
> > 
> >> On 03/03/2017 01:50 PM, Cornelia Huck wrote:
> >>> On Fri, 3 Mar 2017 13:43:32 +0100
> >>> Halil Pasic <address@hidden> wrote:
> >>>
> >>>> On 03/03/2017 01:21 PM, Cornelia Huck wrote:
> >>>>> On Thu,  2 Mar 2017 19:59:42 +0100
> >>>>> Halil Pasic <address@hidden> wrote:
> [...]
> >> I admit, I did not investigate this thoroughly, also because the patch
> >> is flawed regarding multi-thread anyway. After a quick investigation
> >> it seems the linux guest won't auto-reset the device so the guest should
> >> end up with a not working device. I think it's pretty likely that the
> >> admin will check the logs if the device was important.
> > 
> > Thinking a bit more about this, it seems setting the device broken is
> > not the right solution for exactly that reason. Setting the virtio
> > device broken is a way to signal the guest to 'you did something
> > broken; please reset the device and start anew' (and that's how current
> > callers use it). In our case, this is not the guest's fault.
> Do we have something to just say stuff broken without blaming the guest?
> And device reset might not be that stupid at all in the given situation,
> if we want to save what can be saved from the perspective of the guest.
> (After reset stuff should work again until we hit the race again -- and
> since turning ioeventfd on/off should not happen that often during normal
> operation it could help limit damage suffered -- e.g. controlled shutdown).

Checking again, the spec says

DEVICE_NEEDS_RESET (64) Indicates that the device has experienced an
error from which it can’t recover.

Nothing about 'guest error'.

The only problem is that legacy devices don't have that state, which
means they'll have a broken device through no fault of their own.

> > 
> > Maybe go back to the assert 'solution'? But I'm not sure that's enough
> > if production builds disable asserts...
> > 
> I will wait a bit, maybe other virtio folks are going to have an 
> opinion too.
> My concern about the assert solution is that for production it is
> either too rigorous (kill off, hopefully with a dump) or not
> enough (as you have mentioned, if NDEBUG assert does nothing).
> I think there are setups where a loss of device does not have to be
> fatal, and I would not like to be the one who makes it fatal (for the
> guest).

Basically, it's a host bug (and not a bug specific to a certain
device). Moving the device which was impacted to a broken state may be
a useful mitigation.

But yes, let's hear some other opinions.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]