1.. SPDX-License-Identifier: GPL-2.0 2 3=========================== 4Hypercall Op-codes (hcalls) 5=========================== 6 7Overview 8========= 9 10Virtualization on 64-bit Power Book3S Platforms is based on the PAPR 11specification [1]_ which describes the run-time environment for a guest 12operating system and how it should interact with the hypervisor for 13privileged operations. Currently there are two PAPR compliant hypervisors: 14 15- **IBM PowerVM (PHYP)**: IBM's proprietary hypervisor that supports AIX, 16 IBM-i and Linux as supported guests (termed as Logical Partitions 17 or LPARS). It supports the full PAPR specification. 18 19- **Qemu/KVM**: Supports PPC64 linux guests running on a PPC64 linux host. 20 Though it only implements a subset of PAPR specification called LoPAPR [2]_. 21 22On PPC64 arch a guest kernel running on top of a PAPR hypervisor is called 23a *pSeries guest*. A pseries guest runs in a supervisor mode (HV=0) and must 24issue hypercalls to the hypervisor whenever it needs to perform an action 25that is hypervisor privileged [3]_ or for other services managed by the 26hypervisor. 27 28Hence a Hypercall (hcall) is essentially a request by the pseries guest 29asking hypervisor to perform a privileged operation on behalf of the guest. The 30guest issues a with necessary input operands. The hypervisor after performing 31the privilege operation returns a status code and output operands back to the 32guest. 33 34HCALL ABI 35========= 36The ABI specification for a hcall between a pseries guest and PAPR hypervisor 37is covered in section 14.5.3 of ref [2]_. Switch to the Hypervisor context is 38done via the instruction **HVCS** that expects the Opcode for hcall is set in *r3* 39and any in-arguments for the hcall are provided in registers *r4-r12*. If values 40have to be passed through a memory buffer, the data stored in that buffer should be 41in Big-endian byte order. 42 43Once control returns back to the guest after hypervisor has serviced the 44'HVCS' instruction the return value of the hcall is available in *r3* and any 45out values are returned in registers *r4-r12*. Again like in case of in-arguments, 46any out values stored in a memory buffer will be in Big-endian byte order. 47 48Powerpc arch code provides convenient wrappers named **plpar_hcall_xxx** defined 49in a arch specific header [4]_ to issue hcalls from the linux kernel 50running as pseries guest. 51 52Register Conventions 53==================== 54 55Any hcall should follow same register convention as described in section 2.2.1.1 56of "64-Bit ELF V2 ABI Specification: Power Architecture"[5]_. Table below 57summarizes these conventions: 58 59+----------+----------+-------------------------------------------+ 60| Register |Volatile | Purpose | 61| Range |(Y/N) | | 62+==========+==========+===========================================+ 63| r0 | Y | Optional-usage | 64+----------+----------+-------------------------------------------+ 65| r1 | N | Stack Pointer | 66+----------+----------+-------------------------------------------+ 67| r2 | N | TOC | 68+----------+----------+-------------------------------------------+ 69| r3 | Y | hcall opcode/return value | 70+----------+----------+-------------------------------------------+ 71| r4-r10 | Y | in and out values | 72+----------+----------+-------------------------------------------+ 73| r11 | Y | Optional-usage/Environmental pointer | 74+----------+----------+-------------------------------------------+ 75| r12 | Y | Optional-usage/Function entry address at | 76| | | global entry point | 77+----------+----------+-------------------------------------------+ 78| r13 | N | Thread-Pointer | 79+----------+----------+-------------------------------------------+ 80| r14-r31 | N | Local Variables | 81+----------+----------+-------------------------------------------+ 82| LR | Y | Link Register | 83+----------+----------+-------------------------------------------+ 84| CTR | Y | Loop Counter | 85+----------+----------+-------------------------------------------+ 86| XER | Y | Fixed-point exception register. | 87+----------+----------+-------------------------------------------+ 88| CR0-1 | Y | Condition register fields. | 89+----------+----------+-------------------------------------------+ 90| CR2-4 | N | Condition register fields. | 91+----------+----------+-------------------------------------------+ 92| CR5-7 | Y | Condition register fields. | 93+----------+----------+-------------------------------------------+ 94| Others | N | | 95+----------+----------+-------------------------------------------+ 96 97DRC & DRC Indexes 98================= 99:: 100 101 DR1 Guest 102 +--+ +------------+ +---------+ 103 | | <----> | | | User | 104 +--+ DRC1 | | DRC | Space | 105 | PAPR | Index +---------+ 106 DR2 | Hypervisor | | | 107 +--+ | | <-----> | Kernel | 108 | | <----> | | Hcall | | 109 +--+ DRC2 +------------+ +---------+ 110 111PAPR hypervisor terms shared hardware resources like PCI devices, NVDIMMs etc 112available for use by LPARs as Dynamic Resource (DR). When a DR is allocated to 113an LPAR, PHYP creates a data-structure called Dynamic Resource Connector (DRC) 114to manage LPAR access. An LPAR refers to a DRC via an opaque 32-bit number 115called DRC-Index. The DRC-index value is provided to the LPAR via device-tree 116where its present as an attribute in the device tree node associated with the 117DR. 118 119HCALL Return-values 120=================== 121 122After servicing the hcall, hypervisor sets the return-value in *r3* indicating 123success or failure of the hcall. In case of a failure an error code indicates 124the cause for error. These codes are defined and documented in arch specific 125header [4]_. 126 127In some cases a hcall can potentially take a long time and need to be issued 128multiple times in order to be completely serviced. These hcalls will usually 129accept an opaque value *continue-token* within there argument list and a 130return value of *H_CONTINUE* indicates that hypervisor hasn't still finished 131servicing the hcall yet. 132 133To make such hcalls the guest need to set *continue-token == 0* for the 134initial call and use the hypervisor returned value of *continue-token* 135for each subsequent hcall until hypervisor returns a non *H_CONTINUE* 136return value. 137 138HCALL Op-codes 139============== 140 141Below is a partial list of HCALLs that are supported by PHYP. For the 142corresponding opcode values please look into the arch specific header [4]_: 143 144**H_SCM_READ_METADATA** 145 146| Input: *drcIndex, offset, buffer-address, numBytesToRead* 147| Out: *numBytesRead* 148| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_Hardware* 149 150Given a DRC Index of an NVDIMM, read N-bytes from the metadata area 151associated with it, at a specified offset and copy it to provided buffer. 152The metadata area stores configuration information such as label information, 153bad-blocks etc. The metadata area is located out-of-band of NVDIMM storage 154area hence a separate access semantics is provided. 155 156**H_SCM_WRITE_METADATA** 157 158| Input: *drcIndex, offset, data, numBytesToWrite* 159| Out: *None* 160| Return Value: *H_Success, H_Parameter, H_P2, H_P4, H_Hardware* 161 162Given a DRC Index of an NVDIMM, write N-bytes to the metadata area 163associated with it, at the specified offset and from the provided buffer. 164 165**H_SCM_BIND_MEM** 166 167| Input: *drcIndex, startingScmBlockIndex, numScmBlocksToBind,* 168| *targetLogicalMemoryAddress, continue-token* 169| Out: *continue-token, targetLogicalMemoryAddress, numScmBlocksToBound* 170| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_P4, H_Overlap,* 171| *H_Too_Big, H_P5, H_Busy* 172 173Given a DRC-Index of an NVDIMM, map a continuous SCM blocks range 174*(startingScmBlockIndex, startingScmBlockIndex+numScmBlocksToBind)* to the guest 175at *targetLogicalMemoryAddress* within guest physical address space. In 176case *targetLogicalMemoryAddress == 0xFFFFFFFF_FFFFFFFF* then hypervisor 177assigns a target address to the guest. The HCALL can fail if the Guest has 178an active PTE entry to the SCM block being bound. 179 180**H_SCM_UNBIND_MEM** 181| Input: drcIndex, startingScmLogicalMemoryAddress, numScmBlocksToUnbind 182| Out: numScmBlocksUnbound 183| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_In_Use, H_Overlap,* 184| *H_Busy, H_LongBusyOrder1mSec, H_LongBusyOrder10mSec* 185 186Given a DRC-Index of an NVDimm, unmap *numScmBlocksToUnbind* SCM blocks starting 187at *startingScmLogicalMemoryAddress* from guest physical address space. The 188HCALL can fail if the Guest has an active PTE entry to the SCM block being 189unbound. 190 191**H_SCM_QUERY_BLOCK_MEM_BINDING** 192 193| Input: *drcIndex, scmBlockIndex* 194| Out: *Guest-Physical-Address* 195| Return Value: *H_Success, H_Parameter, H_P2, H_NotFound* 196 197Given a DRC-Index and an SCM Block index return the guest physical address to 198which the SCM block is mapped to. 199 200**H_SCM_QUERY_LOGICAL_MEM_BINDING** 201 202| Input: *Guest-Physical-Address* 203| Out: *drcIndex, scmBlockIndex* 204| Return Value: *H_Success, H_Parameter, H_P2, H_NotFound* 205 206Given a guest physical address return which DRC Index and SCM block is mapped 207to that address. 208 209**H_SCM_UNBIND_ALL** 210 211| Input: *scmTargetScope, drcIndex* 212| Out: *None* 213| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_In_Use, H_Busy,* 214| *H_LongBusyOrder1mSec, H_LongBusyOrder10mSec* 215 216Depending on the Target scope unmap all SCM blocks belonging to all NVDIMMs 217or all SCM blocks belonging to a single NVDIMM identified by its drcIndex 218from the LPAR memory. 219 220**H_SCM_HEALTH** 221 222| Input: drcIndex 223| Out: *health-bitmap (r4), health-bit-valid-bitmap (r5)* 224| Return Value: *H_Success, H_Parameter, H_Hardware* 225 226Given a DRC Index return the info on predictive failure and overall health of 227the PMEM device. The asserted bits in the health-bitmap indicate one or more states 228(described in table below) of the PMEM device and health-bit-valid-bitmap indicate 229which bits in health-bitmap are valid. The bits are reported in 230reverse bit ordering for example a value of 0xC400000000000000 231indicates bits 0, 1, and 5 are valid. 232 233Health Bitmap Flags: 234 235+------+-----------------------------------------------------------------------+ 236| Bit | Definition | 237+======+=======================================================================+ 238| 00 | PMEM device is unable to persist memory contents. | 239| | If the system is powered down, nothing will be saved. | 240+------+-----------------------------------------------------------------------+ 241| 01 | PMEM device failed to persist memory contents. Either contents were | 242| | not saved successfully on power down or were not restored properly on | 243| | power up. | 244+------+-----------------------------------------------------------------------+ 245| 02 | PMEM device contents are persisted from previous IPL. The data from | 246| | the last boot were successfully restored. | 247+------+-----------------------------------------------------------------------+ 248| 03 | PMEM device contents are not persisted from previous IPL. There was no| 249| | data to restore from the last boot. | 250+------+-----------------------------------------------------------------------+ 251| 04 | PMEM device memory life remaining is critically low | 252+------+-----------------------------------------------------------------------+ 253| 05 | PMEM device will be garded off next IPL due to failure | 254+------+-----------------------------------------------------------------------+ 255| 06 | PMEM device contents cannot persist due to current platform health | 256| | status. A hardware failure may prevent data from being saved or | 257| | restored. | 258+------+-----------------------------------------------------------------------+ 259| 07 | PMEM device is unable to persist memory contents in certain conditions| 260+------+-----------------------------------------------------------------------+ 261| 08 | PMEM device is encrypted | 262+------+-----------------------------------------------------------------------+ 263| 09 | PMEM device has successfully completed a requested erase or secure | 264| | erase procedure. | 265+------+-----------------------------------------------------------------------+ 266|10:63 | Reserved / Unused | 267+------+-----------------------------------------------------------------------+ 268 269**H_SCM_PERFORMANCE_STATS** 270 271| Input: drcIndex, resultBuffer Addr 272| Out: None 273| Return Value: *H_Success, H_Parameter, H_Unsupported, H_Hardware, H_Authority, H_Privilege* 274 275Given a DRC Index collect the performance statistics for NVDIMM and copy them 276to the resultBuffer. 277 278**H_SCM_FLUSH** 279 280| Input: *drcIndex, continue-token* 281| Out: *continue-token* 282| Return Value: *H_SUCCESS, H_Parameter, H_P2, H_BUSY* 283 284Given a DRC Index Flush the data to backend NVDIMM device. 285 286The hcall returns H_BUSY when the flush takes longer time and the hcall needs 287to be issued multiple times in order to be completely serviced. The 288*continue-token* from the output to be passed in the argument list of 289subsequent hcalls to the hypervisor until the hcall is completely serviced 290at which point H_SUCCESS or other error is returned by the hypervisor. 291 292**H_HTM** 293 294| Input: flags, target, operation (op), op-param1, op-param2, op-param3 295| Out: *dumphtmbufferdata* 296| Return Value: *H_Success,H_Busy,H_LongBusyOrder,H_Partial,H_Parameter, 297 H_P2,H_P3,H_P4,H_P5,H_P6,H_State,H_Not_Available,H_Authority* 298 299H_HTM supports setup, configuration, control and dumping of Hardware Trace 300Macro (HTM) function and its data. HTM buffer stores tracing data for functions 301like core instruction, core LLAT and nest. 302 303References 304========== 305.. [1] "Power Architecture Platform Reference" 306 https://en.wikipedia.org/wiki/Power_Architecture_Platform_Reference 307.. [2] "Linux on Power Architecture Platform Reference" 308 https://members.openpowerfoundation.org/document/dl/469 309.. [3] "Definitions and Notation" Book III-Section 14.5.3 310 https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0 311.. [4] arch/powerpc/include/asm/hvcall.h 312.. [5] "64-Bit ELF V2 ABI Specification: Power Architecture" 313 https://openpowerfoundation.org/?resource_lib=64-bit-elf-v2-abi-specification-power-architecture 314