[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 10/15] s390-bios: Support for running format-0/1
From: |
Cornelia Huck |
Subject: |
Re: [Qemu-devel] [PATCH 10/15] s390-bios: Support for running format-0/1 channel programs |
Date: |
Tue, 8 Jan 2019 12:07:19 +0100 |
On Mon, 7 Jan 2019 14:02:45 -0500
"Jason J. Herne" <address@hidden> wrote:
> On 12/13/18 12:21 PM, Cornelia Huck wrote:
> > On Wed, 12 Dec 2018 09:11:13 -0500
> > "Jason J. Herne" <address@hidden> wrote:
> >
> >> Add struct for format-0 ccws. Support executing format-0 channel
> >> programs and waiting for their completion before continuing execution.
> >> This will be used for real dasd ipl.
> >>
> >> Add cu_type() to channel io library. This will be used to query control
> >> unit type which is used to determine if we are booting a virtio device or a
> >> real dasd device.
> >>
> >> Signed-off-by: Jason J. Herne <address@hidden>
> >> ---
> >> pc-bios/s390-ccw/cio.c | 108 ++++++++++++++++++++++++++++++++++++++
> >> pc-bios/s390-ccw/cio.h | 124
> >> ++++++++++++++++++++++++++++++++++++++++++--
> >> pc-bios/s390-ccw/s390-ccw.h | 1 +
> >> pc-bios/s390-ccw/start.S | 33 +++++++++++-
> >> 4 files changed, 261 insertions(+), 5 deletions(-)
> >>
> >
> > (...)
> >
> >> +static bool irb_error(Irb *irb)
> >> +{
> >> + /* We have to ignore Incorrect Length (cstat == 0x40) indicators
> >> because
> >> + * real devices expect a 24 byte SenseID buffer, and virtio devices
> >> expect
> >> + * a much larger buffer. Neither device type can tolerate a buffer
> >> size
> >> + * different from what they expect so they set this indicator.
> >
> > Hm, can't you specify SLI for SenseID?
> >
>
> Yes, but this requires modifying run_ccw() in virtio.c to always specify the
> SLI flag. I'm
> not sure that is the best choice? I suppose I could add an sli argument to
> run_ccw if
> you'd prefer that.
Ignoring an error feels wrong :) Telling the function "hey, I don't
know what buffer size you expect, just give me what you have" feels
better. If I read SA22-7204-01 correctly, there's always just a minimum
of sense id data, and how much we get is device dependent. (FWIW, the
Linux kernel does sense id with SLI as well.)
So yes, +1 to adding a sli parameter to run_ccw().
>
> >> + */
> >> + if (irb->scsw.cstat != 0x00 && irb->scsw.cstat != 0x40) {
> >> + return true;
> >> + }
> >> + return irb->scsw.dstat != 0xc;
> >
> > Also, shouldn't you actually use the #defines you introduce further
> > down?
> >
>
> Yep, I added the defines after I wrote this code. I'll fix that.
>
> >> +}
> >> +
> >> +/* Executes a channel program at a given subchannel. The request to run
> >> the
> >> + * channel program is sent to the subchannel, we then wait for the
> >> interrupt
> >> + * singaling completion of the I/O operation(s) perfomed by the channel
s/perfomed/performed/
> >> + * program. Lastly we verify that the i/o operation completed without
> >> error and
> >> + * that the interrupt we received was for the subchannel used to run the
> >> + * channel program.
> >> + *
> >> + * Note: This function assumes it is running in an environment where no
> >> other
> >> + * cpus are generating or receiving I/O interrupts. So either run it in a
> >> + * single-cpu environment or make sure all other cpus are not doing I/O
> >> and
> >> + * have I/O interrupts masked off.
> >
> > Anything about iscs here (cr6)?
> >
>
> Those details are handled in the assembler code. Do you think I should
> mention something
> about cr6 here?
We can probably do without.
>
> >> + */
> >> +int do_cio(SubChannelId schid, uint32_t ccw_addr, int fmt)
> >> +{
> >> + CmdOrb orb = {};
> >> + Irb irb = {};
> >> + SenseData sd;
> >> + int rc, retries = 0;
> >> +
> >> + IPL_assert(fmt == 0 || fmt == 1, "Invalid ccw format");
> >> +
> >> + /* ccw_addr must be <= 24 bits and point to at least one whole ccw. */
> >> + if (fmt == 0) {
> >> + IPL_assert(ccw_addr <= 0xFFFFFF - 8, "Invalid ccw address");
> >> + }
> >> +
> >> + orb.fmt = fmt ;
> >> + orb.pfch = 1; /* QEMU's cio implementation requires prefetch */
> >> + orb.c64 = 1; /* QEMU's cio implementation requires 64-bit idaws */
> >> + orb.lpm = 0xFF; /* All paths allowed */
> >> + orb.cpa = ccw_addr;
> >> +
> >> + while (true) {
> >> + rc = ssch(schid, &orb);
> >
> > I think we can get here:
> > - cc 0 -> all ok
> > - cc 1 -> status pending; could that be an unsolicited interrupt from
> > the device? or would we always get a deferred cc 1 in that case?
> > - cc 2 -> another function pending; Should Not Happen
> > - cc 3 -> it's dead, Jim
> >
> > So I'm wondering whether we should consume the status and retry for cc
> > 1. The handling of the others is fine.
> >
>
> I took a look at css_do_ssch() in hw/s390x/css.c and it appears as though CC1
> is a
> possibility here. I'm not against taking action, but I suspect we would have
> to clear the
> status with a basic sense (or something) before simply retrying... right?
It depends on the status. If you get an unit check, you'll probably
need the basic sense; in other cases, you'll probably want to simply
retry.
>
> Is it safe for us to just assume we can clear it and move on? It seems like
> an edge case
> that we'd be better off failing on. Perhaps let the user try again which will
> redrive the
> process?
A very low amount of retries (2?) sounds reasonable: this would keep us
going if there's a state from the device that will be ignored anyway,
and won't get us stuck in a pointless retry loop if something more
involved is going on.
>
>
> >> + if (rc) {
> >> + print_int("ssch failed with rc=", rc);
> >> + break;
> >> + }
> >> +
> >> + consume_io_int();
> >> +
> >> + /* Clear read */
> >
> > I find that comment confusing. /* collect status */ maybe?
> >
> >> + rc = tsch(schid, &irb);
> >
> > Here we can get:
> > - cc 0 -> status pending, all ok
> > - cc 1 -> no status pending, Should Not Happen
> > - cc 3 -> it's dead, Jim
> >
> > So this looks fine.
> >
> >> + if (rc) {
> >> + print_int("tsch failed with rc=", rc);
> >> + break;
> >> + }
> >> +
> >> + if (!irb_error(&irb)) {
> >> + break;
> >> + }
> >> +
> >> + /* Unexpected unit check. Use sense to clear unit check then
> >> retry. */
> >
> > The dasds still don't support concurrent sense, do they? Might also be
> > worth investigating whether some unit checks are more "recoverable"
> > than others.
> >
>
> I wasn't sure on concurrent sense. I'd bet there are situations or
> environments where it
> won't be supported so it seems safest to assume we don't have it.
Ok.
>
> We already recover from the one unit check scenario I've discovered in
> practice (initial
> reset). And the algorithm I chose is to simply retry a few times whenever
> we're presented
> with unexpected unit check status. This is what the kernel does. It seems
> fairly robust.
Nod.
> > I expect we simply want to ignore IFCCs? IIRC, the strategy for those
> > is "retry, in case it is transient"; but that may take some time. Or
> > was there some path handling to be considered? (i.e., retrying may
> > select another path, which may be fine.)
> >
>
> Currently we'll give up on IFCC. I think this is the right thing to do. A
> user can always
> retry if they want. But in reality an IFCC very likely means faulty hardware
> IIUC.
It could also be a transient link issue. Maybe retry twice, just to
avoid the very tiny blips?
> I've not thought about path management much. I suspect paths changing isn't
> something we
> should realistically see in the bios. Even still, a retry is really all we
> can do, so
> assuming path changes result in a unit check then we should be okay there.
If you use a full path mask, the channel subsystem might try a
different path (that is working correctly) the next time. I don't think
you want to implement path grouping stuff in the bios, which would mean
a lot of pain for very little gain :)
Thinking about path groups: One scenario we might have is that another
LPAR did a reserve on a dasd and then died. The dasd is then
unaccessible by our LPAR until we do a steal lock. If the device is
bound to the vfio-ccw subchannel driver, we don't have an interface for
that, though (we would need to re-bind to the I/O subchannel driver and
the dasd driver so we can invoke tunedasd). We could add an option to
break the lock from the bios, although that's probably overkill for a
real edge case. Just wanted to mention it :)
>
>
> >> + if (unit_check(&irb) && retries <= 2) {
> >> + basic_sense(schid, &sd);
> >> + retries++;
> >> + continue;
> >> + }
> >> +
> >> + break;
> >> + }
> >> +
> >> + return rc;
> >> +}
> >
> > (...)
> >
> >> @@ -190,6 +247,9 @@ struct ciw {
> >> __u16 count;
> >> };
> >>
> >> +#define CU_TYPE_VIRTIO 0x3832
> >> +#define CU_TYPE_DASD 0x3990
> >
> > No other dasd types we want to support? :) (Not sure if others are out
> > in the wild. Maybe FBA?)
> >
>
> I have no idea. I assumed 3390 was the only thing we supported. Perhaps 3380?
> I'd need to
> find a test device, which I could probably do ... I'll look more into this.
IIRC, z/VM can hand out FBA devices. I'm not sure if current storage
systems can emulate them.
>
>
> >> +
> >> /*
> >> * sense-id response buffer layout
> >> */
> >> @@ -205,6 +265,61 @@ typedef struct senseid {
> >> struct ciw ciw[62];
> >> } __attribute__ ((packed, aligned(4))) SenseId;
> >>
> >> +/* architected values for first sense byte */
> >> +#define SNS0_CMD_REJECT 0x80
> >> +#define SNS0_INTERVENTION_REQ 0x40
> >> +#define SNS0_BUS_OUT_CHECK 0x20
> >> +#define SNS0_EQUIPMENT_CHECK 0x10
> >> +#define SNS0_DATA_CHECK 0x08
> >> +#define SNS0_OVERRUN 0x04
> >> +#define SNS0_INCOMPL_DOMAIN 0x01
> >
> > IIRC, only byte 0 is device independent, and the others below are
> > (ECKD) dasd specific?
> >
> >> +
> >> +/* architectured values for second sense byte */
> >> +#define SNS1_PERM_ERR 0x80
> >> +#define SNS1_INV_TRACK_FORMAT 0x40
> >> +#define SNS1_EOC 0x20
> >> +#define SNS1_MESSAGE_TO_OPER 0x10
> >> +#define SNS1_NO_REC_FOUND 0x08
> >> +#define SNS1_FILE_PROTECTED 0x04
> >> +#define SNS1_WRITE_INHIBITED 0x02
> >> +#define SNS1_INPRECISE_END 0x01
> >> +
> >> +/* architectured values for third sense byte */
> >> +#define SNS2_REQ_INH_WRITE 0x80
> >> +#define SNS2_CORRECTABLE 0x40
> >> +#define SNS2_FIRST_LOG_ERR 0x20
> >> +#define SNS2_ENV_DATA_PRESENT 0x10
> >> +#define SNS2_INPRECISE_END 0x04
> >> +
> >> +/* 24-byte Sense fmt/msg codes */
> >> +#define SENSE24_FMT_PROG_SYS 0x0
> >> +#define SENSE24_FMT_EQUIPMENT 0x2
> >> +#define SENSE24_FMT_CONTROLLER 0x3
> >> +#define SENSE24_FMT_MISC 0xF
> >> +
> >> +#define SENSE24_FMT0_MSG_RESET_NOTIFICATION 0x16
> >> +
> >> +/* basic sense response buffer layout */
> >> +typedef struct senseData {
> >> + uint8_t status[3];
> >> + uint8_t res_count;
> >> + uint8_t phys_drive_id;
> >> + uint8_t low_cyl_addr;
> >> + uint8_t head_high_cyl_addr;
> >> + uint8_t fmt_msg;
> >> + uint64_t fmt_dependent_info[2];
> >> + uint8_t reserved;
> >> + uint8_t program_action_code;
> >> + uint16_t config_info;
> >> + uint8_t mcode_hicyl;
> >> + uint8_t cyl_head_addr[3];
> >> +} __attribute__ ((packed, aligned(4))) SenseData;
> >
> > And this looks _really_ dasd specific.
> >
>
> Yep, I glossed over those details while I was furiously tracking down the
> reset bug. I'll
> take a look at redesigning this.
Ok.
>
> >> +
> >> +#define SENSE24_GET_FMT(sd) (sd->fmt_msg & 0xF0 >> 4)
> >> +#define SENSE24_GET_MSG(sd) (sd->fmt_msg & 0x0F)
> >> +
> >> +#define unit_check(irb) ((irb)->scsw.dstat & SCSW_DSTAT_UCHK)
> >> +
> >> /* interruption response block */
> >> typedef struct irb {
> >> struct scsw scsw;
> >
> > (...)
> >
> >> diff --git a/pc-bios/s390-ccw/start.S b/pc-bios/s390-ccw/start.S
> >> index eb8d024..a48c38f 100644
> >> --- a/pc-bios/s390-ccw/start.S
> >> +++ b/pc-bios/s390-ccw/start.S
> >> @@ -65,12 +65,32 @@ consume_sclp_int:
> >> /* prepare external call handler */
> >> larl %r1, external_new_code
> >> stg %r1, 0x1b8
> >> - larl %r1, external_new_mask
> >> + larl %r1, int_new_mask
> >> mvc 0x1b0(8),0(%r1)
> >> /* load enabled wait PSW */
> >> larl %r1, enabled_wait_psw
> >> lpswe 0(%r1)
> >>
> >> +/*
> >> + * void consume_io_int(void)
> >> + *
> >> + * eats one I/O interrupt
> >
> > *nomnom*
> >
> >> + */
> >> + .globl consume_io_int
> >> +consume_io_int:
> >> + /* enable I/O interrupts in cr0 */
> >
> > cr6?
> >
> >> + stctg 6,6,0(15)
> >> + oi 4(15), 0xff
> >> + lctlg 6,6,0(15)
> >> + /* prepare external call handler */
> >
> > I/O call handler?
> >
>
> Both copy/paste errors. Thanks for catching these. :)
>
> >> + larl %r1, io_new_code
> >> + stg %r1, 0x1f8
> >> + larl %r1, int_new_mask
> >> + mvc 0x1f0(8),0(%r1)
> >> + /* load enabled wait PSW */
> >> + larl %r1, enabled_wait_psw
> >> + lpswe 0(%r1)
> >> +
> >> external_new_code:
> >> /* disable service interrupts in cr0 */
> >> stctg 0,0,0(15)
> >> @@ -78,10 +98,19 @@ external_new_code:
> >> lctlg 0,0,0(15)
> >> br 14
> >>
> >> +io_new_code:
> >> + /* disable I/O interrupts in cr6 */
> >> + stctg 6,6,0(15)
> >
> > I'm wondering why you are changing cr6 every time you wait for an I/O
> > interrupt. Just enable the isc(s) you want once, and disable them again
> > after you're done with all I/O? Simply disabling the I/O interrupts
> > should be enough to prevent further interrupts popping up. You maybe
> > want two enabled wait PSWs, one with I/O + external and one with
> > external only?
> >
>
> No real reason. We only come through here a hand full of times so performance
> is not a
> consideration. I guess my thought process was probably to keep the system is
> as close to
> initial state as possible through the ipl process. Eventually when we hand
> control to the
> guest OS we want the system as close to undisturbed as possible. If you think
> I should
> only be setting cr-6 once, it sounds reasonable.
It just looked a bit odd to me. But I agree that this isn't
performance-sensitive.
>
>
> >> + ni 4(15), 0x00
> >> + lctlg 6,6,0(15)
> >> + br 14
> >> +
> >> +
> >> +
> >> .align 8
> >> disabled_wait_psw:
> >> .quad 0x0002000180000000,0x0000000000000000
> >> enabled_wait_psw:
> >> .quad 0x0302000180000000,0x0000000000000000
> >> -external_new_mask:
> >> +int_new_mask:
> >> .quad 0x0000000180000000
> >
> >
>
>
Re: [Qemu-devel] [qemu-s390x] [PATCH 10/15] s390-bios: Support for running format-0/1 channel programs, Jason J. Herne, 2019/01/14