qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 2/9] s390x: fix invalid use of cc 1 for SSCH


From: Halil Pasic
Subject: Re: [Qemu-devel] [PATCH 2/9] s390x: fix invalid use of cc 1 for SSCH
Date: Tue, 5 Sep 2017 19:20:43 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.0


On 09/05/2017 05:46 PM, Cornelia Huck wrote:
> On Tue, 5 Sep 2017 17:24:19 +0200
> Halil Pasic <address@hidden> wrote:
> 
>> My problem with a program check (indicated by SCSW word 2 bit 10) is
>> that, in my reading of the architecture, the semantic behind it is: The
>> channel subsystem (not the cu or device) has detected, that the 
>> the channel program (previously submitted as an ORB) is erroneous. Which
>> programs are erroneous is specified by the architecture. What we have
>> here does not qualify.
>>
>> My idea was to rather blame the virtual hardware (device) and put no blame
>> on the program nor he channel subsystem. This could be done using device
>> status (unit check with command reject, maybe unit exception) or interface
>> check. My train of thought was, the problem is not consistent across a
>> device type, so it has to be device specific.
> 
> Unit exception might be a better way to express what is happening here.
> At least, it moves us away from cc 1 and not towards cc 3 :)
> 

I will do a follow up patch pursuing device exception.

>>
>> Of course blaming the device could mislead the person encountering the
>> problem, and make him believe it's an non-virtual hardware problem.
>>
>> About the misleading, I think the best we can do is log out a message
>> indicating what really happened.
> 
> Just document it in the code? If it doesn't happen with Linux as a
> guest, it is highly unlikely to be seen in the wild.
> 


Well we have two problems here:
1) Unit exception can be already defined by the device type for the
command (reference: 
http://publibfp.dhe.ibm.com/cgi-bin/bookmgr/BOOKS/dz9ar110/2.6.10?DT=19920904110920).
I think this one is what you mean. And I agree that's best handled
with comment in code.
2) The poor user/programmer is trying to figure out why things
don't work (why are we getting the unit exception)? I think that's
best remedied with producing something for the log (maybe a warning
with warn_report which states that the implementation vfio-ccw requires
the given flags).

[..] 
>>>>>> @@ -1115,7 +1112,7 @@ static int do_subchannel_work(SubchDev *sch)
>>>>>>      if (sch->do_subchannel_work) {
>>>>>>          return sch->do_subchannel_work(sch);
>>>>>>      } else {
>>>>>> -        return -EINVAL;
>>>>>> +        return -ENODEV;    
>>>>>
>>>>> This rather seems like a job for an assert? If we don't have a function
>>>>> for the 'asynchronous' handling of the various functions assigned for a
>>>>> subchannel, that looks like an internal error.
>>>>>     
>>>>
>>>> IMHO it depends. Aborting qemu is heavy handed, and as an user I would not
>>>> be happy about it. But certainly it is an assert situation.  We can look 
>>>> for
>>>> an even better solution, but I think this is an improvement. The logic 
>>>> behind
>>>> is that the device is broken and can't be talked to properly.  
>>>
>>> We currently don't have a vast array of subchannel types (and are
>>> unlikely to get more types that need a different handler function). We
>>> know the current ones are fine, and an assert would catch programming
>>> errors early.
>>>   
>>
>> Despite of that we already had a problem of this type: see 1728cff2ab
>> ("s390x/3270: fix instruction interception handler", 2017-06-09) by 
>> Dong Jia. If we had some automated testing covering all the asserts
>> I would not think twice about using an assert here. But I don't think
>> we do and I'm reluctant (not positive that assert is superior to what
>> we have now). Maybe we could agree on reported by again.
> 
> Yes, we (as in generally 'we') are really lacking automated testing...
> (it is somewhere on my todo list).
> 
> Either leave it as-is, or do an assert. -ENODEV just feels wrong.
> 

I think I will leave this one as is and maybe try to discuss with
the folks here about reliable test coverage. Just spoke with Marc H.,
and according to that we have a long way to go.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]