qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 2/9] s390x: fix invalid use of cc 1 for SSCH


From: Cornelia Huck
Subject: Re: [Qemu-devel] [PATCH 2/9] s390x: fix invalid use of cc 1 for SSCH
Date: Tue, 5 Sep 2017 17:46:06 +0200

On Tue, 5 Sep 2017 17:24:19 +0200
Halil Pasic <address@hidden> wrote:

> My problem with a program check (indicated by SCSW word 2 bit 10) is
> that, in my reading of the architecture, the semantic behind it is: The
> channel subsystem (not the cu or device) has detected, that the 
> the channel program (previously submitted as an ORB) is erroneous. Which
> programs are erroneous is specified by the architecture. What we have
> here does not qualify.
> 
> My idea was to rather blame the virtual hardware (device) and put no blame
> on the program nor he channel subsystem. This could be done using device
> status (unit check with command reject, maybe unit exception) or interface
> check. My train of thought was, the problem is not consistent across a
> device type, so it has to be device specific.

Unit exception might be a better way to express what is happening here.
At least, it moves us away from cc 1 and not towards cc 3 :)

> 
> Of course blaming the device could mislead the person encountering the
> problem, and make him believe it's an non-virtual hardware problem.
> 
> About the misleading, I think the best we can do is log out a message
> indicating what really happened.

Just document it in the code? If it doesn't happen with Linux as a
guest, it is highly unlikely to be seen in the wild.

> 
> In the end I don't care that deeply about vfio-ccw, and this problem
> already took me more time than I intended to spend on this. We have
> people driving this whole vfio-ccw stuff and I'm not one of them (I'm
> rather in the supporting role).
> 
> I'm also fine with me being credited with a reported-by once the
> more involved people figure out what to do, and keeping the vfio-ccw
> stuff as is. Should we go with that option? 

If converting the reporting to a device status is straightforward
enough, let's do that. I'm fine with postponing this and waiting for a
real fix as well (I don't really have spare cycles to actually write
vfio-ccw code currently...)

> >>>> @@ -1115,7 +1112,7 @@ static int do_subchannel_work(SubchDev *sch)
> >>>>      if (sch->do_subchannel_work) {
> >>>>          return sch->do_subchannel_work(sch);
> >>>>      } else {
> >>>> -        return -EINVAL;
> >>>> +        return -ENODEV;    
> >>>
> >>> This rather seems like a job for an assert? If we don't have a function
> >>> for the 'asynchronous' handling of the various functions assigned for a
> >>> subchannel, that looks like an internal error.
> >>>     
> >>
> >> IMHO it depends. Aborting qemu is heavy handed, and as an user I would not
> >> be happy about it. But certainly it is an assert situation.  We can look 
> >> for
> >> an even better solution, but I think this is an improvement. The logic 
> >> behind
> >> is that the device is broken and can't be talked to properly.  
> > 
> > We currently don't have a vast array of subchannel types (and are
> > unlikely to get more types that need a different handler function). We
> > know the current ones are fine, and an assert would catch programming
> > errors early.
> >   
> 
> Despite of that we already had a problem of this type: see 1728cff2ab
> ("s390x/3270: fix instruction interception handler", 2017-06-09) by 
> Dong Jia. If we had some automated testing covering all the asserts
> I would not think twice about using an assert here. But I don't think
> we do and I'm reluctant (not positive that assert is superior to what
> we have now). Maybe we could agree on reported by again.

Yes, we (as in generally 'we') are really lacking automated testing...
(it is somewhere on my todo list).

Either leave it as-is, or do an assert. -ENODEV just feels wrong.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]