xref: /qemu/docs/devel/s390-dasd-ipl.rst (revision 430f63e250a55c5fdfa31ffbddd8538dc1ce6b36)
1*cc3d15a5SCornelia HuckBooting from real channel-attached devices on s390x
2*cc3d15a5SCornelia Huck===================================================
3*cc3d15a5SCornelia Huck
4*cc3d15a5SCornelia Hucks390 hardware IPL
5*cc3d15a5SCornelia Huck-----------------
6*cc3d15a5SCornelia Huck
7*cc3d15a5SCornelia HuckThe s390 hardware IPL process consists of the following steps.
8*cc3d15a5SCornelia Huck
9*cc3d15a5SCornelia Huck1. A READ IPL ccw is constructed in memory location ``0x0``.
10*cc3d15a5SCornelia Huck   This ccw, by definition, reads the IPL1 record which is located on the disk
11*cc3d15a5SCornelia Huck   at cylinder 0 track 0 record 1. Note that the chain flag is on in this ccw
12*cc3d15a5SCornelia Huck   so when it is complete another ccw will be fetched and executed from memory
13*cc3d15a5SCornelia Huck   location ``0x08``.
14*cc3d15a5SCornelia Huck
15*cc3d15a5SCornelia Huck2. Execute the Read IPL ccw at ``0x00``, thereby reading IPL1 data into ``0x00``.
16*cc3d15a5SCornelia Huck   IPL1 data is 24 bytes in length and consists of the following pieces of
17*cc3d15a5SCornelia Huck   information: ``[psw][read ccw][tic ccw]``. When the machine executes the Read
18*cc3d15a5SCornelia Huck   IPL ccw it read the 24-bytes of IPL1 to be read into memory starting at
19*cc3d15a5SCornelia Huck   location ``0x0``. Then the ccw program at ``0x08`` which consists of a read
20*cc3d15a5SCornelia Huck   ccw and a tic ccw is automatically executed because of the chain flag from
21*cc3d15a5SCornelia Huck   the original READ IPL ccw. The read ccw will read the IPL2 data into memory
22*cc3d15a5SCornelia Huck   and the TIC (Transfer In Channel) will transfer control to the channel
23*cc3d15a5SCornelia Huck   program contained in the IPL2 data. The TIC channel command is the
24*cc3d15a5SCornelia Huck   equivalent of a branch/jump/goto instruction for channel programs.
25*cc3d15a5SCornelia Huck
26*cc3d15a5SCornelia Huck   NOTE: The ccws in IPL1 are defined by the architecture to be format 0.
27*cc3d15a5SCornelia Huck
28*cc3d15a5SCornelia Huck3. Execute IPL2.
29*cc3d15a5SCornelia Huck   The TIC ccw instruction at the end of the IPL1 channel program will begin
30*cc3d15a5SCornelia Huck   the execution of the IPL2 channel program. IPL2 is stage-2 of the boot
31*cc3d15a5SCornelia Huck   process and will contain a larger channel program than IPL1. The point of
32*cc3d15a5SCornelia Huck   IPL2 is to find and load either the operating system or a small program that
33*cc3d15a5SCornelia Huck   loads the operating system from disk. At the end of this step all or some of
34*cc3d15a5SCornelia Huck   the real operating system is loaded into memory and we are ready to hand
35*cc3d15a5SCornelia Huck   control over to the guest operating system. At this point the guest
36*cc3d15a5SCornelia Huck   operating system is entirely responsible for loading any more data it might
37*cc3d15a5SCornelia Huck   need to function.
38*cc3d15a5SCornelia Huck
39*cc3d15a5SCornelia Huck   NOTE: The IPL2 channel program might read data into memory
40*cc3d15a5SCornelia Huck   location ``0x0`` thereby overwriting the IPL1 psw and channel program. This is ok
41*cc3d15a5SCornelia Huck   as long as the data placed in location ``0x0`` contains a psw whose instruction
42*cc3d15a5SCornelia Huck   address points to the guest operating system code to execute at the end of
43*cc3d15a5SCornelia Huck   the IPL/boot process.
44*cc3d15a5SCornelia Huck
45*cc3d15a5SCornelia Huck   NOTE: The ccws in IPL2 are defined by the architecture to be format 0.
46*cc3d15a5SCornelia Huck
47*cc3d15a5SCornelia Huck4. Start executing the guest operating system.
48*cc3d15a5SCornelia Huck   The psw that was loaded into memory location ``0x0`` as part of the ipl process
49*cc3d15a5SCornelia Huck   should contain the needed flags for the operating system we have loaded. The
50*cc3d15a5SCornelia Huck   psw's instruction address will point to the location in memory where we want
51*cc3d15a5SCornelia Huck   to start executing the operating system. This psw is loaded (via LPSW
52*cc3d15a5SCornelia Huck   instruction) causing control to be passed to the operating system code.
53*cc3d15a5SCornelia Huck
54*cc3d15a5SCornelia HuckIn a non-virtualized environment this process, handled entirely by the hardware,
55*cc3d15a5SCornelia Huckis kicked off by the user initiating a "Load" procedure from the hardware
56*cc3d15a5SCornelia Huckmanagement console. This "Load" procedure crafts a special "Read IPL" ccw in
57*cc3d15a5SCornelia Huckmemory location 0x0 that reads IPL1. It then executes this ccw thereby kicking
58*cc3d15a5SCornelia Huckoff the reading of IPL1 data. Since the channel program from IPL1 will be
59*cc3d15a5SCornelia Huckwritten immediately after the special "Read IPL" ccw, the IPL1 channel program
60*cc3d15a5SCornelia Huckwill be executed immediately (the special read ccw has the chaining bit turned
61*cc3d15a5SCornelia Huckon). The TIC at the end of the IPL1 channel program will cause the IPL2 channel
62*cc3d15a5SCornelia Huckprogram to be executed automatically. After this sequence completes the "Load"
63*cc3d15a5SCornelia Huckprocedure then loads the psw from ``0x0``.
64*cc3d15a5SCornelia Huck
65*cc3d15a5SCornelia HuckHow this all pertains to QEMU (and the kernel)
66*cc3d15a5SCornelia Huck----------------------------------------------
67*cc3d15a5SCornelia Huck
68*cc3d15a5SCornelia HuckIn theory we should merely have to do the following to IPL/boot a guest
69*cc3d15a5SCornelia Huckoperating system from a DASD device:
70*cc3d15a5SCornelia Huck
71*cc3d15a5SCornelia Huck1. Place a "Read IPL" ccw into memory location ``0x0`` with chaining bit on.
72*cc3d15a5SCornelia Huck2. Execute channel program at ``0x0``.
73*cc3d15a5SCornelia Huck3. LPSW ``0x0``.
74*cc3d15a5SCornelia Huck
75*cc3d15a5SCornelia HuckHowever, our emulation of the machine's channel program logic within the kernel
76*cc3d15a5SCornelia Huckis missing one key feature that is required for this process to work:
77*cc3d15a5SCornelia Hucknon-prefetch of ccw data.
78*cc3d15a5SCornelia Huck
79*cc3d15a5SCornelia HuckWhen we start a channel program we pass the channel subsystem parameters via an
80*cc3d15a5SCornelia HuckORB (Operation Request Block). One of those parameters is a prefetch bit. If the
81*cc3d15a5SCornelia Huckbit is on then the vfio-ccw kernel driver is allowed to read the entire channel
82*cc3d15a5SCornelia Huckprogram from guest memory before it starts executing it. This means that any
83*cc3d15a5SCornelia Huckchannel commands that read additional channel commands will not work as expected
84*cc3d15a5SCornelia Huckbecause the newly read commands will only exist in guest memory and NOT within
85*cc3d15a5SCornelia Huckthe kernel's channel subsystem memory. The kernel vfio-ccw driver currently
86*cc3d15a5SCornelia Huckrequires this bit to be on for all channel programs. This is a problem because
87*cc3d15a5SCornelia Huckthe IPL process consists of transferring control from the "Read IPL" ccw
88*cc3d15a5SCornelia Huckimmediately to the IPL1 channel program that was read by "Read IPL".
89*cc3d15a5SCornelia Huck
90*cc3d15a5SCornelia HuckNot being able to turn off prefetch will also prevent the TIC at the end of the
91*cc3d15a5SCornelia HuckIPL1 channel program from transferring control to the IPL2 channel program.
92*cc3d15a5SCornelia Huck
93*cc3d15a5SCornelia HuckLastly, in some cases (the zipl bootloader for example) the IPL2 program also
94*cc3d15a5SCornelia Hucktransfers control to another channel program segment immediately after reading
95*cc3d15a5SCornelia Huckit from the disk. So we need to be able to handle this case.
96*cc3d15a5SCornelia Huck
97*cc3d15a5SCornelia HuckWhat QEMU does
98*cc3d15a5SCornelia Huck--------------
99*cc3d15a5SCornelia Huck
100*cc3d15a5SCornelia HuckSince we are forced to live with prefetch we cannot use the very simple IPL
101*cc3d15a5SCornelia Huckprocedure we defined in the preceding section. So we compensate by doing the
102*cc3d15a5SCornelia Huckfollowing.
103*cc3d15a5SCornelia Huck
104*cc3d15a5SCornelia Huck1. Place "Read IPL" ccw into memory location ``0x0``, but turn off chaining bit.
105*cc3d15a5SCornelia Huck2. Execute "Read IPL" at ``0x0``.
106*cc3d15a5SCornelia Huck
107*cc3d15a5SCornelia Huck   So now IPL1's psw is at ``0x0`` and IPL1's channel program is at ``0x08``.
108*cc3d15a5SCornelia Huck
109*cc3d15a5SCornelia Huck3. Write a custom channel program that will seek to the IPL2 record and then
110*cc3d15a5SCornelia Huck   execute the READ and TIC ccws from IPL1.  Normally the seek is not required
111*cc3d15a5SCornelia Huck   because after reading the IPL1 record the disk is automatically positioned
112*cc3d15a5SCornelia Huck   to read the very next record which will be IPL2. But since we are not reading
113*cc3d15a5SCornelia Huck   both IPL1 and IPL2 as part of the same channel program we must manually set
114*cc3d15a5SCornelia Huck   the position.
115*cc3d15a5SCornelia Huck
116*cc3d15a5SCornelia Huck4. Grab the target address of the TIC instruction from the IPL1 channel program.
117*cc3d15a5SCornelia Huck   This address is where the IPL2 channel program starts.
118*cc3d15a5SCornelia Huck
119*cc3d15a5SCornelia Huck   Now IPL2 is loaded into memory somewhere, and we know the address.
120*cc3d15a5SCornelia Huck
121*cc3d15a5SCornelia Huck5. Execute the IPL2 channel program at the address obtained in step #4.
122*cc3d15a5SCornelia Huck
123*cc3d15a5SCornelia Huck   Because this channel program can be dynamic, we must use a special algorithm
124*cc3d15a5SCornelia Huck   that detects a READ immediately followed by a TIC and breaks the ccw chain
125*cc3d15a5SCornelia Huck   by turning off the chain bit in the READ ccw. When control is returned from
126*cc3d15a5SCornelia Huck   the kernel/hardware to the QEMU bios code we immediately issue another start
127*cc3d15a5SCornelia Huck   subchannel to execute the remaining TIC instruction. This causes the entire
128*cc3d15a5SCornelia Huck   channel program (starting from the TIC) and all needed data to be refetched
129*cc3d15a5SCornelia Huck   thereby stepping around the limitation that would otherwise prevent this
130*cc3d15a5SCornelia Huck   channel program from executing properly.
131*cc3d15a5SCornelia Huck
132*cc3d15a5SCornelia Huck   Now the operating system code is loaded somewhere in guest memory and the psw
133*cc3d15a5SCornelia Huck   in memory location ``0x0`` will point to entry code for the guest operating
134*cc3d15a5SCornelia Huck   system.
135*cc3d15a5SCornelia Huck
136*cc3d15a5SCornelia Huck6. LPSW ``0x0``
137*cc3d15a5SCornelia Huck
138*cc3d15a5SCornelia Huck   LPSW transfers control to the guest operating system and we're done.
139