[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: css_clear_io_interrupt() error handling
|
From: |
Cornelia Huck |
|
Subject: |
Re: css_clear_io_interrupt() error handling |
|
Date: |
Mon, 08 May 2023 11:01:55 +0200 |
|
User-agent: |
Notmuch/0.37 (https://notmuchmail.org) |
On Mon, May 08 2023, Markus Armbruster <armbru@redhat.com> wrote:
> css_clear_io_interrupt() aborts on unexpected ioctl() errors, and I
> wonder whether that's appropriate. Let's have a closer look:
>
> static void css_clear_io_interrupt(uint16_t subchannel_id,
> uint16_t subchannel_nr)
> {
> Error *err = NULL;
> static bool no_clear_irq;
> S390FLICState *fs = s390_get_flic();
> S390FLICStateClass *fsc = s390_get_flic_class(fs);
> int r;
>
> if (unlikely(no_clear_irq)) {
> return;
> }
> r = fsc->clear_io_irq(fs, subchannel_id, subchannel_nr);
> switch (r) {
> case 0:
> break;
> case -ENOSYS:
> no_clear_irq = true;
> /*
> * Ignore unavailability, as the user can't do anything
> * about it anyway.
> */
> break;
> default:
> error_setg_errno(&err, -r, "unexpected error condition");
> error_propagate(&error_abort, err);
> }
> }
>
> The default case is abort() with a liberal amount of lipstick applied.
> Let's ignore the lipstick and focus on the abort().
>
> fsc->clear_io_irq ist either qemu_s390_clear_io_flic() order
> kvm_s390_clear_io_flic().
>
> Only kvm_s390_clear_io_flic() can return non-zero: -errno when ioctl()
> fails.
>
> The ioctl() is KVM_SET_DEVICE_ATTR for KVM_DEV_FLIC_CLEAR_IO_IRQ with
> subchannel_id and subchannel_nr. I.e. we assume that this can only fail
> with ENOSYS, und crash hard when the assumption turns out to be wrong.
>
> Is this error condition a programming error? I figure it can be one
> only if the ioctl()'s contract promises us it cannot fail in any other
> way unless we violate preconditions.
>
> Is the error condition fatal, i.e. continuing would be unsafe?
>
> If it's a fatal programming error, then abort() is appropriate.
>
> If it's fatal, but not a programming error, we should exit(1) instead.
>
> If it's a survivable programming error, use of abort() is a matter of
> taste.
>From what I remember, this was introduced to clean up a potentially
queued interrupt that is not supposed to be delivered, so the worst
thing that could happen on failure is a spurious interrupt (same as what
could happen if the kernel flic doesn't provide this function in the
first place.) My main worry would be changes/breakages on the kernel
side (while the QEMU side remains unchanged).
So, I think we should continue to log the error in any case; but I don't
have a strong opinion as to whether we should use exit(1) (as I wouldn't
consider it a programming error) or just continue. Halil, your choice :)
- css_clear_io_interrupt() error handling, Markus Armbruster, 2023/05/08
- Re: css_clear_io_interrupt() error handling,
Cornelia Huck <=
- Re: css_clear_io_interrupt() error handling, Halil Pasic, 2023/05/09
- Re: css_clear_io_interrupt() error handling, Markus Armbruster, 2023/05/10
- Re: css_clear_io_interrupt() error handling, Halil Pasic, 2023/05/10
- Re: css_clear_io_interrupt() error handling, Markus Armbruster, 2023/05/11
- Re: css_clear_io_interrupt() error handling, Halil Pasic, 2023/05/11
- Re: css_clear_io_interrupt() error handling, Markus Armbruster, 2023/05/15