qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 4/9] s390x: refactor error handling for SSCH and


From: Halil Pasic
Subject: Re: [Qemu-devel] [PATCH 4/9] s390x: refactor error handling for SSCH and RSCH
Date: Thu, 7 Sep 2017 12:15:05 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.0


On 09/07/2017 10:58 AM, Dong Jia Shi wrote:
> * Halil Pasic <address@hidden> [2017-09-06 16:43:42 +0200]:
> 
>>
>>
>> On 09/06/2017 04:20 PM, Cornelia Huck wrote:
>>> On Wed, 6 Sep 2017 14:25:13 +0200
>>> Halil Pasic <address@hidden> wrote:
>>>
>>>> We have basically two possibilities/options which ask for different
>>>> handling:
>>>> 1) EFAULT is due to a bug in the vfio-ccw implementation
>>>> (can be QEMU or kernel).
>>>> 2) EFAULT is due to buggy channel program.
>>>>
>>>> Option 2) is basically to be handled with a channel-program check and
>>>> setting primary secondary and alert status. For reference see PoP page
>>>> 15-59 ("Designation of Storage Area").  An exception may be an invalid
>>>> channel program address in the ORB. There the channel-program check ain't
>>>> explicitly stated (although) I would expect one. It may be implied by the
>>>> things on page 15-59 though.
>>>>
>>>> Option 1) is however very similar to other we have figured out that the
>>>> implementation is broken situations and should be handled consequently.
>>>> The current state of the discussion is with a unit exception.
>>>>
>>>> Does that make sense?
>>>
>>> I think the situation is slightly different here, though. For the orb
>>> flags, we reject something out of hand because we have not implemented
>>> it, and for that, unit exception sounds like a good fit. Processing
>>> errors, however, are more similar to errors in the hardware, and as
>>> such can probably be reported via something like equipment check.
>>>
>>
>> Noted. Let's see what Dong Jia has to say, before we continuing a
>> discussion on something (option 1) what may be irrelevant anyway.
>>
>>>>
>>>> Now, Dong Jia, I need your help to figure out do we have option 1 or
>>>> option 2 here? After quick look at the kernel code, it appears to me that
>>>> I've seen both option 1 and option 2 (I'm afraid) -- but my assessment
>>>> was really very superficial.
> There are three cases (all in the kernel) that generate a -EFAULT ret
> code:
> a. vfio_ccw_mdev_write: copy_from_user() fails.
>   This is option 1.
> 
> b. ccwchain_fetch_tic
>   It's mostly likely that the vfio-ccw driver processed the ccw chains
>   wrongly. (Actually I can not think of any other reason.)
>   This is option 1.
> 
> c. ccwchain_fetch_idal
>   When we find that an IDAW contents an invalid address
>   This is option 2.
> 

So my fear was justified. If we can't tell if its option 1 or 2, and
currently we can't, I would say we shall trust our infrastructure and
blame the channel program: that boils down to channel-program check.
That's my assessment with the kernel driver being as-is.

If we are willing to change the vfio-ccw kernel driver, then generating
a response to option 2 conditions in the kernel (basically an SCSW) is
IMHO better. For instance we can do SCSW.cpa and SCWS.count properly.
AFAIR we are not allowed to present conditions that logically did not
happen (yet) -- for instance if we have per-fetched a bad CCW address
but the given ccw did not become active yet.

If we handle option 2 via a different channel (SCWS instead of return
code) then we could happily do the proper handling for option 1 here.

So Dong Jia the call is again yours to make!

Regards,
Halil




reply via email to

[Prev in Thread] Current Thread [Next in Thread]