15fb004a2SDongjiu GengAPEI tables generating and CPER record 25fb004a2SDongjiu Geng====================================== 35fb004a2SDongjiu Geng 45fb004a2SDongjiu Geng.. 55fb004a2SDongjiu Geng Copyright (c) 2020 HUAWEI TECHNOLOGIES CO., LTD. 65fb004a2SDongjiu Geng 75fb004a2SDongjiu Geng This work is licensed under the terms of the GNU GPL, version 2 or later. 85fb004a2SDongjiu Geng See the COPYING file in the top-level directory. 95fb004a2SDongjiu Geng 105fb004a2SDongjiu GengDesign Details 115fb004a2SDongjiu Geng-------------- 125fb004a2SDongjiu Geng 135fb004a2SDongjiu Geng:: 145fb004a2SDongjiu Geng 155fb004a2SDongjiu Geng etc/acpi/tables etc/hardware_errors 165fb004a2SDongjiu Geng ==================== =============================== 175fb004a2SDongjiu Geng + +--------------------------+ +----------------------------+ 185fb004a2SDongjiu Geng | | HEST | +--------->| error_block_address1 |------+ 195fb004a2SDongjiu Geng | +--------------------------+ | +----------------------------+ | 205fb004a2SDongjiu Geng | | GHES1 | | +------->| error_block_address2 |------+-+ 215fb004a2SDongjiu Geng | +--------------------------+ | | +----------------------------+ | | 225fb004a2SDongjiu Geng | | ................. | | | | .............. | | | 235fb004a2SDongjiu Geng | | error_status_address-----+-+ | -----------------------------+ | | 245fb004a2SDongjiu Geng | | ................. | | +--->| error_block_addressN |------+-+---+ 255fb004a2SDongjiu Geng | | read_ack_register--------+-+ | | +----------------------------+ | | | 265fb004a2SDongjiu Geng | | read_ack_preserve | +-+---+--->| read_ack_register1 | | | | 275fb004a2SDongjiu Geng | | read_ack_write | | | +----------------------------+ | | | 285fb004a2SDongjiu Geng + +--------------------------+ | +-+--->| read_ack_register2 | | | | 295fb004a2SDongjiu Geng | | GHES2 | | | | +----------------------------+ | | | 305fb004a2SDongjiu Geng + +--------------------------+ | | | | ............. | | | | 315fb004a2SDongjiu Geng | | ................. | | | | +----------------------------+ | | | 325fb004a2SDongjiu Geng | | error_status_address-----+---+ | | +->| read_ack_registerN | | | | 335fb004a2SDongjiu Geng | | ................. | | | | +----------------------------+ | | | 345fb004a2SDongjiu Geng | | read_ack_register--------+-----+ | | |Generic Error Status Block 1|<-----+ | | 355fb004a2SDongjiu Geng | | read_ack_preserve | | | |-+------------------------+-+ | | 365fb004a2SDongjiu Geng | | read_ack_write | | | | | CPER | | | | 375fb004a2SDongjiu Geng + +--------------------------| | | | | CPER | | | | 385fb004a2SDongjiu Geng | | ............... | | | | | .... | | | | 395fb004a2SDongjiu Geng + +--------------------------+ | | | | CPER | | | | 405fb004a2SDongjiu Geng | | GHESN | | | |-+------------------------+-| | | 415fb004a2SDongjiu Geng + +--------------------------+ | | |Generic Error Status Block 2|<-------+ | 425fb004a2SDongjiu Geng | | ................. | | | |-+------------------------+-+ | 435fb004a2SDongjiu Geng | | error_status_address-----+-------+ | | | CPER | | | 445fb004a2SDongjiu Geng | | ................. | | | | CPER | | | 455fb004a2SDongjiu Geng | | read_ack_register--------+---------+ | | .... | | | 465fb004a2SDongjiu Geng | | read_ack_preserve | | | CPER | | | 475fb004a2SDongjiu Geng | | read_ack_write | +-+------------------------+-+ | 485fb004a2SDongjiu Geng + +--------------------------+ | .......... | | 495fb004a2SDongjiu Geng |----------------------------+ | 505fb004a2SDongjiu Geng |Generic Error Status Block N |<----------+ 515fb004a2SDongjiu Geng |-+-------------------------+-+ 525fb004a2SDongjiu Geng | | CPER | | 535fb004a2SDongjiu Geng | | CPER | | 545fb004a2SDongjiu Geng | | .... | | 555fb004a2SDongjiu Geng | | CPER | | 565fb004a2SDongjiu Geng +-+-------------------------+-+ 575fb004a2SDongjiu Geng 585fb004a2SDongjiu Geng 595fb004a2SDongjiu Geng(1) QEMU generates the ACPI HEST table. This table goes in the current 605fb004a2SDongjiu Geng "etc/acpi/tables" fw_cfg blob. Each error source has different 615fb004a2SDongjiu Geng notification types. 625fb004a2SDongjiu Geng 635fb004a2SDongjiu Geng(2) A new fw_cfg blob called "etc/hardware_errors" is introduced. QEMU 645fb004a2SDongjiu Geng also needs to populate this blob. The "etc/hardware_errors" fw_cfg blob 655fb004a2SDongjiu Geng contains an address registers table and an Error Status Data Block table. 665fb004a2SDongjiu Geng 675fb004a2SDongjiu Geng(3) The address registers table contains N Error Block Address entries 685fb004a2SDongjiu Geng and N Read Ack Register entries. The size for each entry is 8-byte. 695fb004a2SDongjiu Geng The Error Status Data Block table contains N Error Status Data Block 70*84c14675SMauro Carvalho Chehab entries. The size for each entry is defined at the source code as 71*84c14675SMauro Carvalho Chehab ACPI_GHES_MAX_RAW_DATA_LENGTH (currently 1024 bytes). The total size 72*84c14675SMauro Carvalho Chehab for the "etc/hardware_errors" fw_cfg blob is 73*84c14675SMauro Carvalho Chehab (N * 8 * 2 + N * ACPI_GHES_MAX_RAW_DATA_LENGTH) bytes. 745fb004a2SDongjiu Geng N is the number of the kinds of hardware error sources. 755fb004a2SDongjiu Geng 765fb004a2SDongjiu Geng(4) QEMU generates the ACPI linker/loader script for the firmware. The 775fb004a2SDongjiu Geng firmware pre-allocates memory for "etc/acpi/tables", "etc/hardware_errors" 785fb004a2SDongjiu Geng and copies blob contents there. 795fb004a2SDongjiu Geng 805fb004a2SDongjiu Geng(5) QEMU generates N ADD_POINTER commands, which patch addresses in the 815fb004a2SDongjiu Geng "error_status_address" fields of the HEST table with a pointer to the 825fb004a2SDongjiu Geng corresponding "address registers" in the "etc/hardware_errors" blob. 835fb004a2SDongjiu Geng 845fb004a2SDongjiu Geng(6) QEMU generates N ADD_POINTER commands, which patch addresses in the 855fb004a2SDongjiu Geng "read_ack_register" fields of the HEST table with a pointer to the 865fb004a2SDongjiu Geng corresponding "read_ack_register" within the "etc/hardware_errors" blob. 875fb004a2SDongjiu Geng 885fb004a2SDongjiu Geng(7) QEMU generates N ADD_POINTER commands for the firmware, which patch 895fb004a2SDongjiu Geng addresses in the "error_block_address" fields with a pointer to the 905fb004a2SDongjiu Geng respective "Error Status Data Block" in the "etc/hardware_errors" blob. 915fb004a2SDongjiu Geng 925fb004a2SDongjiu Geng(8) QEMU defines a third and write-only fw_cfg blob which is called 935fb004a2SDongjiu Geng "etc/hardware_errors_addr". Through that blob, the firmware can send back 945fb004a2SDongjiu Geng the guest-side allocation addresses to QEMU. The "etc/hardware_errors_addr" 955fb004a2SDongjiu Geng blob contains a 8-byte entry. QEMU generates a single WRITE_POINTER command 965fb004a2SDongjiu Geng for the firmware. The firmware will write back the start address of 975fb004a2SDongjiu Geng "etc/hardware_errors" blob to the fw_cfg file "etc/hardware_errors_addr". 985fb004a2SDongjiu Geng 995fb004a2SDongjiu Geng(9) When QEMU gets a SIGBUS from the kernel, QEMU writes CPER into corresponding 1005fb004a2SDongjiu Geng "Error Status Data Block", guest memory, and then injects platform specific 1015fb004a2SDongjiu Geng interrupt (in case of arm/virt machine it's Synchronous External Abort) as a 1025fb004a2SDongjiu Geng notification which is necessary for notifying the guest. 1035fb004a2SDongjiu Geng 1045fb004a2SDongjiu Geng(10) This notification (in virtual hardware) will be handled by the guest 1055fb004a2SDongjiu Geng kernel, on receiving notification, guest APEI driver could read the CPER error 1065fb004a2SDongjiu Geng and take appropriate action. 1075fb004a2SDongjiu Geng 1085fb004a2SDongjiu Geng(11) kvm_arch_on_sigbus_vcpu() uses source_id as index in "etc/hardware_errors" to 1095fb004a2SDongjiu Geng find out "Error Status Data Block" entry corresponding to error source. So supported 1105fb004a2SDongjiu Geng source_id values should be assigned here and not be changed afterwards to make sure 1115fb004a2SDongjiu Geng that guest will write error into expected "Error Status Data Block" even if guest was 1125fb004a2SDongjiu Geng migrated to a newer QEMU. 113