[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 15/15] s390-bios: Support booting from real dasd

From: Jason J. Herne
Subject: Re: [Qemu-devel] [PATCH 15/15] s390-bios: Support booting from real dasd device
Date: Tue, 19 Feb 2019 09:57:20 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1

On 2/4/19 7:02 AM, Cornelia Huck wrote:
On Tue, 29 Jan 2019 08:29:22 -0500
"Jason J. Herne" <address@hidden> wrote:

Allows guest to boot from a vfio configured real dasd device.

Signed-off-by: Jason J. Herne <address@hidden>
  docs/devel/s390-dasd-ipl.txt | 132 +++++++++++++++++++++++
  pc-bios/s390-ccw/Makefile    |   2 +-
  pc-bios/s390-ccw/dasd-ipl.c  | 249 +++++++++++++++++++++++++++++++++++++++++++
  pc-bios/s390-ccw/dasd-ipl.h  |  16 +++
  pc-bios/s390-ccw/main.c      |   4 +
  pc-bios/s390-ccw/s390-arch.h |  13 +++
  6 files changed, 415 insertions(+), 1 deletion(-)
  create mode 100644 docs/devel/s390-dasd-ipl.txt
  create mode 100644 pc-bios/s390-ccw/dasd-ipl.c
  create mode 100644 pc-bios/s390-ccw/dasd-ipl.h

diff --git a/docs/devel/s390-dasd-ipl.txt b/docs/devel/s390-dasd-ipl.txt
new file mode 100644
index 0000000..84ec7b8
--- /dev/null
+++ b/docs/devel/s390-dasd-ipl.txt
@@ -0,0 +1,132 @@
+***** s390 hardware IPL *****
+The s390 hardware IPL process consists of the following steps.
+1. A READ IPL ccw is constructed in memory location 0x0.
+    This ccw, by definition, reads the IPL1 record which is located on the disk
+    at cylinder 0 track 0 record 1. Note that the chain flag is on in this ccw
+    so when it is complete another ccw will be fetched and executed from memory
+    location 0x08.
+2. Execute the Read IPL ccw at 0x00, thereby reading IPL1 data into 0x00.
+    IPL1 data is 24 bytes in length and consists of the following pieces of
+    information: [psw][read ccw][tic ccw]. When the machine executes the Read
+    IPL ccw it read the 24-bytes of IPL1 to be read into memory starting at
+    location 0x0. Then the ccw program at 0x08 which consists of a read
+    ccw and a tic ccw is automatically executed because of the chain flag from
+    the original READ IPL ccw. The read ccw will read the IPL2 data into memory
+    and the TIC (Tranfer In Channel) will transfer control to the channel
+    program contained in the IPL2 data. The TIC channel command is the
+    equivalent of a branch/jump/goto instruction for channel programs.
+    NOTE: The ccws in IPL1 are defined by the architecture to be format 0.
+3. Execute IPL2.
+    The TIC ccw instruction at the end of the IPL1 channel program will begin
+    the execution of the IPL2 channel program. IPL2 is stage-2 of the boot
+    process and will contain a larger channel program than IPL1. The point of
+    IPL2 is to find and load either the operating system or a small program 
+    loads the operating system from disk. At the end of this step all or some 
+    the real operating system is loaded into memory and we are ready to hand
+    control over to the guest operating system. At this point the guest
+    operating system is entirely responsible for loading any more data it might
+    need to function. NOTE: The IPL2 channel program might read data into 
+    location 0 thereby overwriting the IPL1 psw and channel program. This is ok
+    as long as the data placed in location 0 contains a psw whose instruction
+    address points to the guest operating system code to execute at the end of
+    the IPL/boot process.
+    NOTE: The ccws in IPL2 are defined by the architecture to be format 0.
+4. Start executing the guest operating system.
+    The psw that was loaded into memory location 0 as part of the ipl process
+    should contain the needed flags for the operating system we have loaded. 
+    psw's instruction address will point to the location in memory where we 
+    to start executing the operating system. This psw is loaded (via LPSW
+    instruction) causing control to be passed to the operating system code.
+In a non-virtualized environment this process, handled entirely by the 
+is kicked off by the user initiating a "Load" procedure from the hardware
+management console. This "Load" procedure crafts a special "Read IPL" ccw in
+memory location 0x0 that reads IPL1. It then executes this ccw thereby kicking
+off the reading of IPL1 data. Since the channel program from IPL1 will be
+written immediately after the special "Read IPL" ccw, the IPL1 channel program
+will be executed immediately (the special read ccw has the chaining bit turned
+on). The TIC at the end of the IPL1 channel program will cause the IPL2 channel
+program to be executed automatically. After this sequence completes the "Load"
+procedure then loads the psw from 0x0.

Nice summary!

+***** How this all pertains to Qemu *****


(also below)


+In theory we should merely have to do the following to IPL/boot a guest
+operating system from a DASD device:
+1. Place a "Read IPL" ccw into memory location 0x0 with chaining bit on.
+2. Execute channel program at 0x0.
+3. LPSW 0x0.
+However, our emulation of the machine's channel program logic is missing one 
+feature that is required for this process to work: non-prefetch of ccw data.
+When we start a channel program we pass the channel subsystem parameters via an
+ORB (Operation Request Block). One of those parameters is a prefetch bit. If 
+bit is on then Qemu is allowed to read the entire channel program from guest
+memory before it starts executing it. This means that any channel commands that
+read additional channel commands will not work as expected because the newly
+read commands will only exist in guest memory and NOT within Qemu's channel
+subsystem memory. Qemu's channel subsystem's implementation currently requires

But isn't that the vfio-ccw backend, rather than the channel subsystem

Yep, you're right. I'll clarify this.

+this bit to be on for all channel programs. This is a problem because the IPL
+process consists of transferring control from the "Read IPL" ccw immediately to
+the IPL1 channel program that was read by "Read IPL".
+Not being able to turn off prefetch will also prevent the TIC at the end of the
+IPL1 channel program from transferring control to the IPL2 channel program.
+Lastly, in some cases (the zipl bootloader for example) the IPL2 program also
+tansfers control to another channel program segment immediately after reading 
+from the disk. So we need to be able to handle this case.
+***** What Qemu does *****
+Since we are forced to live with prefetch we cannot use the very simple IPL
+procedure we defined in the preceding section. So we compensate by doing the
+1. Place "Read IPL" ccw into memory location 0x0, but turn off chaining bit.
+2. Execute "Read IPL" at 0x0.
+   So now IPL1's psw is at 0x0 and IPL1's channel program is at 0x08.
+4. Write a custom channel program that will seek to the IPL2 record and then
+   execute the READ and TIC ccws from IPL1.  Normamly the seek is not required
+   because after reading the IPL1 record the disk is automatically positioned
+   to read the very next record which will be IPL2. But since we are not 
+   both IPL1 and IPL2 as part of the same channel program we must manually set
+   the position.
+5. Grab the target address of the TIC instruction from the IPL1 channel 
+   This address is where the IPL2 channel program starts.
+   Now IPL2 is loaded into memory somewhere, and we know the address.
+6. Execute the IPL2 channel program at the address obtained in step #5.
+   Because this channel program can be dynamic, we must use a special algorithm
+   that detects a READ immediately followed by a TIC and breaks the ccw chain
+   by turning off the chain bit in the READ ccw. When control is returned from
+   the kernel/hardware to the Qemu bios code we immediately issue another start
+   subchannel to execute the remaining TIC instruction. This causes the entire
+   channel program (starting from the TIC) and all needed data to be refetched
+   thereby stepping around the limitation that would otherwise prevent this
+   channel program from executing properly.
+   Now the operating system code is loaded somewhere in guest memory and the 
+   in memory location 0x0 will point to entry code for the guest operating
+   system.
+7. LPSW 0x0.
+   LPSW transfers control to the guest operating system and we're done.

Also a good explanation of the procedure here!


+static int run_dynamic_ccw_program(SubChannelId schid, uint32_t cpa)
+    bool has_next;
+    uint32_t next_cpa = 0;
+    int rc;
+    do {
+        has_next = dynamic_cp_fixup(cpa, &next_cpa);
+        print_int("executing ccw chain at ", cpa);

Do you want to keep the unconditional print here? Or make it a
debug_print_int, and maybe an unconditional print on error?

Personally, I like having this here unconditionally. If things hang up or go wrong this lets us know if it was before or after we jumped into actual guest OS code. I know I could make it debug only, but having it all the time means better first failure data capture.

-- Jason J. Herne (address@hidden)

reply via email to

[Prev in Thread] Current Thread [Next in Thread]