[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 10/15] s390-bios: Support for running format-0/1
From: |
Jason J. Herne |
Subject: |
Re: [Qemu-devel] [PATCH 10/15] s390-bios: Support for running format-0/1 channel programs |
Date: |
Mon, 7 Jan 2019 14:02:45 -0500 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1 |
On 12/13/18 12:21 PM, Cornelia Huck wrote:
On Wed, 12 Dec 2018 09:11:13 -0500
"Jason J. Herne" <address@hidden> wrote:
Add struct for format-0 ccws. Support executing format-0 channel
programs and waiting for their completion before continuing execution.
This will be used for real dasd ipl.
Add cu_type() to channel io library. This will be used to query control
unit type which is used to determine if we are booting a virtio device or a
real dasd device.
Signed-off-by: Jason J. Herne <address@hidden>
---
pc-bios/s390-ccw/cio.c | 108 ++++++++++++++++++++++++++++++++++++++
pc-bios/s390-ccw/cio.h | 124 ++++++++++++++++++++++++++++++++++++++++++--
pc-bios/s390-ccw/s390-ccw.h | 1 +
pc-bios/s390-ccw/start.S | 33 +++++++++++-
4 files changed, 261 insertions(+), 5 deletions(-)
(...)
+static bool irb_error(Irb *irb)
+{
+ /* We have to ignore Incorrect Length (cstat == 0x40) indicators because
+ * real devices expect a 24 byte SenseID buffer, and virtio devices expect
+ * a much larger buffer. Neither device type can tolerate a buffer size
+ * different from what they expect so they set this indicator.
Hm, can't you specify SLI for SenseID?
Yes, but this requires modifying run_ccw() in virtio.c to always specify the SLI flag. I'm
not sure that is the best choice? I suppose I could add an sli argument to run_ccw if
you'd prefer that.
+ */
+ if (irb->scsw.cstat != 0x00 && irb->scsw.cstat != 0x40) {
+ return true;
+ }
+ return irb->scsw.dstat != 0xc;
Also, shouldn't you actually use the #defines you introduce further
down?
Yep, I added the defines after I wrote this code. I'll fix that.
+}
+
+/* Executes a channel program at a given subchannel. The request to run the
+ * channel program is sent to the subchannel, we then wait for the interrupt
+ * singaling completion of the I/O operation(s) perfomed by the channel
+ * program. Lastly we verify that the i/o operation completed without error and
+ * that the interrupt we received was for the subchannel used to run the
+ * channel program.
+ *
+ * Note: This function assumes it is running in an environment where no other
+ * cpus are generating or receiving I/O interrupts. So either run it in a
+ * single-cpu environment or make sure all other cpus are not doing I/O and
+ * have I/O interrupts masked off.
Anything about iscs here (cr6)?
Those details are handled in the assembler code. Do you think I should mention something
about cr6 here?
+ */
+int do_cio(SubChannelId schid, uint32_t ccw_addr, int fmt)
+{
+ CmdOrb orb = {};
+ Irb irb = {};
+ SenseData sd;
+ int rc, retries = 0;
+
+ IPL_assert(fmt == 0 || fmt == 1, "Invalid ccw format");
+
+ /* ccw_addr must be <= 24 bits and point to at least one whole ccw. */
+ if (fmt == 0) {
+ IPL_assert(ccw_addr <= 0xFFFFFF - 8, "Invalid ccw address");
+ }
+
+ orb.fmt = fmt ;
+ orb.pfch = 1; /* QEMU's cio implementation requires prefetch */
+ orb.c64 = 1; /* QEMU's cio implementation requires 64-bit idaws */
+ orb.lpm = 0xFF; /* All paths allowed */
+ orb.cpa = ccw_addr;
+
+ while (true) {
+ rc = ssch(schid, &orb);
I think we can get here:
- cc 0 -> all ok
- cc 1 -> status pending; could that be an unsolicited interrupt from
the device? or would we always get a deferred cc 1 in that case?
- cc 2 -> another function pending; Should Not Happen
- cc 3 -> it's dead, Jim
So I'm wondering whether we should consume the status and retry for cc
1. The handling of the others is fine.
I took a look at css_do_ssch() in hw/s390x/css.c and it appears as though CC1 is a
possibility here. I'm not against taking action, but I suspect we would have to clear the
status with a basic sense (or something) before simply retrying... right?
Is it safe for us to just assume we can clear it and move on? It seems like an edge case
that we'd be better off failing on. Perhaps let the user try again which will redrive the
process?
+ if (rc) {
+ print_int("ssch failed with rc=", rc);
+ break;
+ }
+
+ consume_io_int();
+
+ /* Clear read */
I find that comment confusing. /* collect status */ maybe?
+ rc = tsch(schid, &irb);
Here we can get:
- cc 0 -> status pending, all ok
- cc 1 -> no status pending, Should Not Happen
- cc 3 -> it's dead, Jim
So this looks fine.
+ if (rc) {
+ print_int("tsch failed with rc=", rc);
+ break;
+ }
+
+ if (!irb_error(&irb)) {
+ break;
+ }
+
+ /* Unexpected unit check. Use sense to clear unit check then retry. */
The dasds still don't support concurrent sense, do they? Might also be
worth investigating whether some unit checks are more "recoverable"
than others.
I wasn't sure on concurrent sense. I'd bet there are situations or environments where it
won't be supported so it seems safest to assume we don't have it.
We already recover from the one unit check scenario I've discovered in practice (initial
reset). And the algorithm I chose is to simply retry a few times whenever we're presented
with unexpected unit check status. This is what the kernel does. It seems fairly robust.
I expect we simply want to ignore IFCCs? IIRC, the strategy for those
is "retry, in case it is transient"; but that may take some time. Or
was there some path handling to be considered? (i.e., retrying may
select another path, which may be fine.)
Currently we'll give up on IFCC. I think this is the right thing to do. A user can always
retry if they want. But in reality an IFCC very likely means faulty hardware IIUC.
I've not thought about path management much. I suspect paths changing isn't something we
should realistically see in the bios. Even still, a retry is really all we can do, so
assuming path changes result in a unit check then we should be okay there.
+ if (unit_check(&irb) && retries <= 2) {
+ basic_sense(schid, &sd);
+ retries++;
+ continue;
+ }
+
+ break;
+ }
+
+ return rc;
+}
(...)
@@ -190,6 +247,9 @@ struct ciw {
__u16 count;
};
+#define CU_TYPE_VIRTIO 0x3832
+#define CU_TYPE_DASD 0x3990
No other dasd types we want to support? :) (Not sure if others are out
in the wild. Maybe FBA?)
I have no idea. I assumed 3390 was the only thing we supported. Perhaps 3380? I'd need to
find a test device, which I could probably do ... I'll look more into this.
+
/*
* sense-id response buffer layout
*/
@@ -205,6 +265,61 @@ typedef struct senseid {
struct ciw ciw[62];
} __attribute__ ((packed, aligned(4))) SenseId;
+/* architected values for first sense byte */
+#define SNS0_CMD_REJECT 0x80
+#define SNS0_INTERVENTION_REQ 0x40
+#define SNS0_BUS_OUT_CHECK 0x20
+#define SNS0_EQUIPMENT_CHECK 0x10
+#define SNS0_DATA_CHECK 0x08
+#define SNS0_OVERRUN 0x04
+#define SNS0_INCOMPL_DOMAIN 0x01
IIRC, only byte 0 is device independent, and the others below are
(ECKD) dasd specific?
+
+/* architectured values for second sense byte */
+#define SNS1_PERM_ERR 0x80
+#define SNS1_INV_TRACK_FORMAT 0x40
+#define SNS1_EOC 0x20
+#define SNS1_MESSAGE_TO_OPER 0x10
+#define SNS1_NO_REC_FOUND 0x08
+#define SNS1_FILE_PROTECTED 0x04
+#define SNS1_WRITE_INHIBITED 0x02
+#define SNS1_INPRECISE_END 0x01
+
+/* architectured values for third sense byte */
+#define SNS2_REQ_INH_WRITE 0x80
+#define SNS2_CORRECTABLE 0x40
+#define SNS2_FIRST_LOG_ERR 0x20
+#define SNS2_ENV_DATA_PRESENT 0x10
+#define SNS2_INPRECISE_END 0x04
+
+/* 24-byte Sense fmt/msg codes */
+#define SENSE24_FMT_PROG_SYS 0x0
+#define SENSE24_FMT_EQUIPMENT 0x2
+#define SENSE24_FMT_CONTROLLER 0x3
+#define SENSE24_FMT_MISC 0xF
+
+#define SENSE24_FMT0_MSG_RESET_NOTIFICATION 0x16
+
+/* basic sense response buffer layout */
+typedef struct senseData {
+ uint8_t status[3];
+ uint8_t res_count;
+ uint8_t phys_drive_id;
+ uint8_t low_cyl_addr;
+ uint8_t head_high_cyl_addr;
+ uint8_t fmt_msg;
+ uint64_t fmt_dependent_info[2];
+ uint8_t reserved;
+ uint8_t program_action_code;
+ uint16_t config_info;
+ uint8_t mcode_hicyl;
+ uint8_t cyl_head_addr[3];
+} __attribute__ ((packed, aligned(4))) SenseData;
And this looks _really_ dasd specific.
Yep, I glossed over those details while I was furiously tracking down the reset bug. I'll
take a look at redesigning this.
+
+#define SENSE24_GET_FMT(sd) (sd->fmt_msg & 0xF0 >> 4)
+#define SENSE24_GET_MSG(sd) (sd->fmt_msg & 0x0F)
+
+#define unit_check(irb) ((irb)->scsw.dstat & SCSW_DSTAT_UCHK)
+
/* interruption response block */
typedef struct irb {
struct scsw scsw;
(...)
diff --git a/pc-bios/s390-ccw/start.S b/pc-bios/s390-ccw/start.S
index eb8d024..a48c38f 100644
--- a/pc-bios/s390-ccw/start.S
+++ b/pc-bios/s390-ccw/start.S
@@ -65,12 +65,32 @@ consume_sclp_int:
/* prepare external call handler */
larl %r1, external_new_code
stg %r1, 0x1b8
- larl %r1, external_new_mask
+ larl %r1, int_new_mask
mvc 0x1b0(8),0(%r1)
/* load enabled wait PSW */
larl %r1, enabled_wait_psw
lpswe 0(%r1)
+/*
+ * void consume_io_int(void)
+ *
+ * eats one I/O interrupt
*nomnom*
+ */
+ .globl consume_io_int
+consume_io_int:
+ /* enable I/O interrupts in cr0 */
cr6?
+ stctg 6,6,0(15)
+ oi 4(15), 0xff
+ lctlg 6,6,0(15)
+ /* prepare external call handler */
I/O call handler?
Both copy/paste errors. Thanks for catching these. :)
+ larl %r1, io_new_code
+ stg %r1, 0x1f8
+ larl %r1, int_new_mask
+ mvc 0x1f0(8),0(%r1)
+ /* load enabled wait PSW */
+ larl %r1, enabled_wait_psw
+ lpswe 0(%r1)
+
external_new_code:
/* disable service interrupts in cr0 */
stctg 0,0,0(15)
@@ -78,10 +98,19 @@ external_new_code:
lctlg 0,0,0(15)
br 14
+io_new_code:
+ /* disable I/O interrupts in cr6 */
+ stctg 6,6,0(15)
I'm wondering why you are changing cr6 every time you wait for an I/O
interrupt. Just enable the isc(s) you want once, and disable them again
after you're done with all I/O? Simply disabling the I/O interrupts
should be enough to prevent further interrupts popping up. You maybe
want two enabled wait PSWs, one with I/O + external and one with
external only?
No real reason. We only come through here a hand full of times so performance is not a
consideration. I guess my thought process was probably to keep the system is as close to
initial state as possible through the ipl process. Eventually when we hand control to the
guest OS we want the system as close to undisturbed as possible. If you think I should
only be setting cr-6 once, it sounds reasonable.
+ ni 4(15), 0x00
+ lctlg 6,6,0(15)
+ br 14
+
+
+
.align 8
disabled_wait_psw:
.quad 0x0002000180000000,0x0000000000000000
enabled_wait_psw:
.quad 0x0302000180000000,0x0000000000000000
-external_new_mask:
+int_new_mask:
.quad 0x0000000180000000
--
-- Jason J. Herne (address@hidden)
Re: [Qemu-devel] [qemu-s390x] [PATCH 10/15] s390-bios: Support for running format-0/1 channel programs, Jason J. Herne, 2019/01/14