Revision tags: v6.16, v6.16-rc7 |
|
#
f10f46a0 |
| 17-Jul-2025 |
Shiju Jose <shiju.jose@huawei.com> |
cxl/events: Trace Memory Sparing Event Record
CXL rev 3.2 section 8.2.10.2.1.4 Table 8-60 defines the Memory Sparing Event Record.
Determine if the event read is memory sparing record and if so tra
cxl/events: Trace Memory Sparing Event Record
CXL rev 3.2 section 8.2.10.2.1.4 Table 8-60 defines the Memory Sparing Event Record.
Determine if the event read is memory sparing record and if so trace the record.
Memory device shall produce a memory sparing event record 1. After completion of a PPR maintenance operation if the memory sparing event record enable bit is set (Field: sPPR/hPPR Operation Mode in Table 8-128/Table 8-131). 2. In response to a query request by the host (see section 8.2.10.7.1.4) to determine the availability of sparing resources. The device shall report the resource availability by producing the Memory Sparing Event Record (see Table 8-60) in which the channel, rank, nibble mask, bank group, bank, row, column, sub-channel fields are a copy of the values specified in the request. If the controller does not support reporting whether a resource is available, and a perform maintenance operation for memory sparing is issued with query resources set to 1, the controller shall return invalid input.
Example trace log for produce memory sparing event record on completion of a soft PPR operation, cxl_memory_sparing: memdev=mem1 host=0000:0f:00.0 serial=3 log=Informational : time=55045163029 uuid=e71f3a40-2d29-4092-8a39-4d1c966c7c65 len=128 flags='0x1' handle=1 related_handle=0 maint_op_class=2 maint_op_sub_class=1 ld_id=0 head_id=0 : flags='' result=0 validity_flags='CHANNEL|RANK|NIBBLE|BANK GROUP|BANK|ROW|COLUMN' spare resource avail=1 channel=2 rank=5 nibble_mask=a59c bank_group=2 bank=4 row=13 column=23 sub_channel=0 comp_id=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 comp_id_pldm_valid_flags='' pldm_entity_id=0x00 pldm_resource_id=0x00
Note: For memory sparing event record, fields 'maintenance operation class' and 'maintenance operation subclass' are defined twice, first in the common event record (Table 8-55) and second in the memory sparing event record (Table 8-60). Thus those in the sparing event record coded as reserved, to be removed when the spec is updated.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Shiju Jose <shiju.jose@huawei.com> Link: https://patch.msgid.link/20250717101817.2104-5-shiju.jose@huawei.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
show more ...
|
#
1f4f8166 |
| 17-Jul-2025 |
Shiju Jose <shiju.jose@huawei.com> |
cxl/events: Update Common Event Record to CXL spec rev 3.2
CXL spec 3.2 section 8.2.10.2.1 Table 8-55, Common Event Record format defined new fields LD-ID and Head ID.
LD-ID: ID of logical device f
cxl/events: Update Common Event Record to CXL spec rev 3.2
CXL spec 3.2 section 8.2.10.2.1 Table 8-55, Common Event Record format defined new fields LD-ID and Head ID.
LD-ID: ID of logical device from where the event originated, which is valid only if LD-ID valid flag is set to 1. CXL spec 3.2 Section 2.4 describes, a Type 3 Multi-Logical Device (MLD) can partition its resources into up to 16 isolated Logical Devices. Each Logical Device is identified by a Logical Device Identifier (LD-ID) in CXL.mem and CXL.io protocols. LD-ID is a 16-bit Logical Device identifier applicable for CXL.io and CXL.mem requests and responses. CXL.mem supports only the lower 4 bits of LD-ID and therefore can support up to 16 unique LD-ID values over the link. Requests and responses forwarded over an MLD Port are tagged with LD-ID.
Head ID: ID of the device head, from where the event originated, which is valid only if head valid flag is set to 1.
Add updates for the above spec changes in the CXL events record and CXL common trace event implementation.
Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Signed-off-by: Shiju Jose <shiju.jose@huawei.com> Link: https://patch.msgid.link/20250717101817.2104-2-shiju.jose@huawei.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
show more ...
|
Revision tags: v6.16-rc6, v6.16-rc5, v6.16-rc4, v6.16-rc3, v6.16-rc2, v6.16-rc1, v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7 |
|
#
36f257e3 |
| 10-Mar-2025 |
Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> |
acpi/ghes, cxl/pci: Process CXL CPER Protocol Errors
When PCIe AER is in FW-First, OS should process CXL Protocol errors from CPER records. Introduce support for handling and logging CXL Protocol er
acpi/ghes, cxl/pci: Process CXL CPER Protocol Errors
When PCIe AER is in FW-First, OS should process CXL Protocol errors from CPER records. Introduce support for handling and logging CXL Protocol errors.
The defined trace events cxl_aer_uncorrectable_error and cxl_aer_correctable_error trace native CXL AER endpoint errors. Reuse them to trace FW-First Protocol errors.
Since the CXL code is required to be called from process context and GHES is in interrupt context, use workqueues for processing.
Similar to CXL CPER event handling, use kfifo to handle errors as it simplifies queue processing by providing lock free fifo operations.
Add the ability for the CXL sub-system to register a workqueue to process CXL CPER protocol errors.
[DJ: return cxl_cper_register_prot_err_work() directly in cxl_ras_init()]
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Reviewed-by: Li Ming <ming.li@zohomail.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Link: https://patch.msgid.link/20250310223839.31342-2-Smita.KoralahalliChannabasappa@amd.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
show more ...
|
Revision tags: v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1 |
|
#
315c2f0b |
| 23-Jan-2025 |
Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> |
acpi/ghes, cper: Recognize and cache CXL Protocol errors
Add support in GHES to detect and process CXL CPER Protocol errors, as defined in UEFI v2.10, section N.2.13.
Define struct cxl_cper_prot_er
acpi/ghes, cper: Recognize and cache CXL Protocol errors
Add support in GHES to detect and process CXL CPER Protocol errors, as defined in UEFI v2.10, section N.2.13.
Define struct cxl_cper_prot_err_work_data to cache CXL protocol error information, including RAS capabilities and severity, for further handling.
These cached CXL CPER records will later be processed by workqueues within the CXL subsystem.
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Gregory Price <gourry@gourry.net> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://patch.msgid.link/20250123084421.127697-5-Smita.KoralahalliChannabasappa@amd.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
show more ...
|
#
958c3a67 |
| 23-Jan-2025 |
Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> |
efi/cper, cxl: Make definitions and structures global
In preparation to add tracepoint support, move protocol error UUID definition to a common location, Also, make struct CXL RAS capability, cxl_cp
efi/cper, cxl: Make definitions and structures global
In preparation to add tracepoint support, move protocol error UUID definition to a common location, Also, make struct CXL RAS capability, cxl_cper_sec_prot_err and CPER validation flags global for use across different modules.
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Gregory Price <gourry@gourry.net> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://patch.msgid.link/20250123084421.127697-3-Smita.KoralahalliChannabasappa@amd.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
show more ...
|
Revision tags: v6.13, v6.13-rc7 |
|
#
4c6e20eb |
| 11-Jan-2025 |
Shiju Jose <shiju.jose@huawei.com> |
cxl/events: Update Memory Module Event Record to CXL spec rev 3.1
CXL spec 3.1 section 8.2.9.2.1.3 Table 8-47, Memory Module Event Record has updated with following new fields and new info for Devic
cxl/events: Update Memory Module Event Record to CXL spec rev 3.1
CXL spec 3.1 section 8.2.9.2.1.3 Table 8-47, Memory Module Event Record has updated with following new fields and new info for Device Event Type and Device Health Information fields. 1. Validity Flags 2. Component Identifier 3. Device Event Sub-Type
Update the Memory Module event record and Memory Module trace event for the above spec changes. The new fields are inserted in logical places.
Example trace print of cxl_memory_module trace event,
cxl_memory_module: memdev=mem3 host=0000:0f:00.0 serial=3 log=Fatal : \ time=371709344709 uuid=fe927475-dd59-4339-a586-79bab113b774 len=128 \ flags='0x1' handle=2 related_handle=0 maint_op_class=0 \ maint_op_sub_class=0 : event_type='Temperature Change' \ event_sub_type='Unsupported Config Data' \ health_status='MAINTENANCE_NEEDED|REPLACEMENT_NEEDED' \ media_status='All Data Loss in Event of Power Loss' as_life_used=0x3 \ as_dev_temp=Normal as_cor_vol_err_cnt=Normal as_cor_per_err_cnt=Normal \ life_used=8 device_temp=3 dirty_shutdown_cnt=33 cor_vol_err_cnt=25 \ cor_per_err_cnt=45 validity_flags='COMPONENT|COMPONENT PLDM FORMAT' \ comp_id=03 74 c5 08 9a 1a 0b fc d2 7e 2f 31 9b 3c 81 4d \ comp_id_pldm_valid_flags='Resource ID' \ pldm_entity_id=0x00 pldm_resource_id=fc d2 7e 2f
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Shiju Jose <shiju.jose@huawei.com> Link: https://patch.msgid.link/20250111091756.1682-6-shiju.jose@huawei.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
show more ...
|
#
24ec41f7 |
| 11-Jan-2025 |
Shiju Jose <shiju.jose@huawei.com> |
cxl/events: Update DRAM Event Record to CXL spec rev 3.1
CXL spec 3.1 section 8.2.9.2.1.2 Table 8-46, DRAM Event Record has updated with following new fields and new types for Memory Event Type, Tra
cxl/events: Update DRAM Event Record to CXL spec rev 3.1
CXL spec 3.1 section 8.2.9.2.1.2 Table 8-46, DRAM Event Record has updated with following new fields and new types for Memory Event Type, Transaction Type and Validity Flags fields. 1. Component Identifier 2. Sub-channel 3. Advanced Programmable Corrected Memory Error Threshold Event Flags 4. Corrected Memory Error Count at Event 5. Memory Event Sub-Type
Update DRAM events record and DRAM trace event for the above spec changes. The new fields are inserted in logical places. Includes trivial consistency of white space improvements.
Example trace print of cxl_dram trace event,
cxl_dram: memdev=mem0 host=0000:0f:00.0 serial=3 log=Informational : \ time=54799339519 uuid=601dcbb3-9c06-4eab-b8af-4e9bfb5c9624 len=128 \ flags='0x1' handle=1 related_handle=0 maint_op_class=1 \ maint_op_sub_class=3 : dpa=18680 dpa_flags='' \ descriptor='UNCORRECTABLE_EVENT|THRESHOLD_EVENT' type='Data Path Error' \ sub_type='Media Link CRC Error' transaction_type='Internal Media Scrub' \ channel=3 rank=17 nibble_mask=3b00b2 bank_group=7 bank=11 row=2 \ column=77 cor_mask=21 00 00 00 00 00 00 00 2c 00 00 00 00 00 00 00 37 00 \ 00 00 00 00 00 00 42 00 00 00 00 00 00 00 validity_flags='CHANNEL|RANK|NIBBLE|\ BANK GROUP|BANK|ROW|COLUMN|CORRECTION MASK|COMPONENT|COMPONENT PLDM FORMAT' \ comp_id=01 74 c5 08 9a 1a 0b fc d2 7e 2f 31 9b 3c 81 4d \ comp_id_pldm_valid_flags='PLDM Entity ID' pldm_entity_id=74 c5 08 9a 1a 0b \ pldm_resource_id=0x00 hpa=ffffffffffffffff region= \ region_uuid=00000000-0000-0000-0000-000000000000 sub_channel=5 \ cme_threshold_ev_flags='Corrected Memory Errors in Multiple Media Components|\ Exceeded Programmable Threshold' cvme_count=148
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Shiju Jose <shiju.jose@huawei.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://patch.msgid.link/20250111091756.1682-5-shiju.jose@huawei.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
show more ...
|
#
ae834131 |
| 11-Jan-2025 |
Shiju Jose <shiju.jose@huawei.com> |
cxl/events: Update General Media Event Record to CXL spec rev 3.1
CXL spec rev 3.1 section 8.2.9.2.1.1 Table 8-45, General Media Event Record has updated with following new fields and new types for
cxl/events: Update General Media Event Record to CXL spec rev 3.1
CXL spec rev 3.1 section 8.2.9.2.1.1 Table 8-45, General Media Event Record has updated with following new fields and new types for Memory Event Type and Transaction Type fields. 1. Advanced Programmable Corrected Memory Error Threshold Event Flags 2. Corrected Memory Error Count at Event 3. Memory Event Sub-Type
The format of component identifier has changed (CXL spec 3.1 section 8.2.9.2.1 Table 8-44).
Update the general media event record and general media trace event for the above spec changes. The new fields are inserted in logical places.
Example trace log of cxl_general_media trace event,
cxl_general_media: memdev=mem0 host=0000:0f:00.0 serial=3 log=Fatal : \ time=156831237413 uuid=fbcd0a77-c260-417f-85a9-088b1621eba6 len=128 \ flags='0x1' handle=1 related_handle=0 maint_op_class=2 \ maint_op_sub_class=4 : dpa=30d40 dpa_flags='' \ descriptor='UNCORRECTABLE_EVENT|THRESHOLD_EVENT|POISON_LIST_OVERFLOW' \ type='TE State Violation' sub_type='Media Link Command Training Error' \ transaction_type='Host Inject Poison' channel=3 rank=33 device=5 \ validity_flags='CHANNEL|RANK|DEVICE|COMPONENT|COMPONENT PLDM FORMAT' \ comp_id=03 74 c5 08 9a 1a 0b fc d2 7e 2f 31 9b 3c 81 4d \ comp_id_pldm_valid_flags='PLDM Entity ID | Resource ID' \ pldm_entity_id=74 c5 08 9a 1a 0b pldm_resource_id=fc d2 7e 2f \ hpa=ffffffffffffffff region= \ region_uuid=00000000-0000-0000-0000-000000000000 \ cme_threshold_ev_flags='Corrected Memory Errors in Multiple Media \ Components|Exceeded Programmable Threshold' cme_count=120
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Shiju Jose <shiju.jose@huawei.com> Link: https://patch.msgid.link/20250111091756.1682-4-shiju.jose@huawei.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
show more ...
|
#
5e31e347 |
| 11-Jan-2025 |
Shiju Jose <shiju.jose@huawei.com> |
cxl/events: Update Common Event Record to CXL spec rev 3.1
CXL spec 3.1 section 8.2.9.2.1 Table 8-42, Common Event Record format has updated with Maintenance Operation Subclass information.
Add upd
cxl/events: Update Common Event Record to CXL spec rev 3.1
CXL spec 3.1 section 8.2.9.2.1 Table 8-42, Common Event Record format has updated with Maintenance Operation Subclass information.
Add updates for the above spec change in the CXL events record and CXL common trace event implementations.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Shiju Jose <shiju.jose@huawei.com> Reviewed-by: Fan Ni <fan.ni@samsung.com> Link: https://patch.msgid.link/20250111091756.1682-2-shiju.jose@huawei.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
show more ...
|
Revision tags: v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7 |
|
#
40a895fd |
| 05-Sep-2024 |
Dave Jiang <dave.jiang@intel.com> |
cxl: move cxl headers to new include/cxl/ directory
Group all cxl related kernel headers into include/cxl/ directory.
Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Ira Wei
cxl: move cxl headers to new include/cxl/ directory
Group all cxl related kernel headers into include/cxl/ directory.
Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://patch.msgid.link/20240905223711.1990186-2-dave.jiang@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
show more ...
|
Revision tags: v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3 |
|
#
675e979d |
| 07-Jun-2024 |
Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com> |
cxl/events: Use a common struct for DRAM and General Media events
cxl_event_common was an unfortunate naming choice and caused confusion with the existing Common Event Record. Furthermore, its field
cxl/events: Use a common struct for DRAM and General Media events
cxl_event_common was an unfortunate naming choice and caused confusion with the existing Common Event Record. Furthermore, its fields didn't map all the common information between DRAM and General Media Events.
Remove cxl_event_common and introduce cxl_event_media_hdr to record common information between DRAM and General Media events.
cxl_event_media_hdr, which is embedded in both cxl_event_gen_media and cxl_event_dram, leverages the commonalities between the two events to simplify their respective handling.
Suggested-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20240607144423.48681-1-fabio.m.de.francesco@linux.intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
show more ...
|
Revision tags: v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7 |
|
#
55111470 |
| 01-May-2024 |
Ira Weiny <ira.weiny@intel.com> |
cxl/cper: Fix non-ACPI-APEI-GHES build
If ACPI_APEI_GHES is not configured the [un]register work functions are not properly declared.
0day notices that the cxl_cper_register_work() declaration in t
cxl/cper: Fix non-ACPI-APEI-GHES build
If ACPI_APEI_GHES is not configured the [un]register work functions are not properly declared.
0day notices that the cxl_cper_register_work() declaration in the CONFIG_ACPI_APEI_GHES=n is broken, fix it to be typical nop stub.
Reported-by: kernel test robot <lkp@intel.com> Closes: http://lore.kernel.org/r/202405012230.6kXItWen-lkp@intel.com Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20240501-cper-fix-0day-v1-1-c0b0056eafbc@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
show more ...
|
#
df2a8f4b |
| 01-May-2024 |
Dave Jiang <dave.jiang@intel.com> |
Merge remote-tracking branch 'cxl/for-6.10/cper' into cxl-for-next
Add support to send CPER records to CXL for more detailed parsing.
|
Revision tags: v6.9-rc6 |
|
#
5e4a264b |
| 27-Apr-2024 |
Ira Weiny <ira.weiny@intel.com> |
acpi/ghes: Process CXL Component Events
BIOS can configure memory devices as firmware first. This will send CXL events to the firmware instead of the OS. The firmware can then inform the OS of the
acpi/ghes: Process CXL Component Events
BIOS can configure memory devices as firmware first. This will send CXL events to the firmware instead of the OS. The firmware can then inform the OS of these events via UEFI.
UEFI v2.10 section N.2.14 defines a Common Platform Error Record (CPER) format for CXL Component Events. The format is mostly the same as the CXL Common Event Record Format. The difference lies in the use of a GUID as the CPER Section Type which matches the UUID defined in CXL 3.1 Table 8-43.
Currently a configuration such as this will trace a non standard event in the log omitting useful details of the event. In addition the CXL sub-system contains additional region and HPA information useful to the user.[0]
The CXL code is required to be called from process context as it needs to take a device lock. The GHES code may be in interrupt context. This complicated the use of a callback. Dan Williams suggested the use of work items as an atomic way of switching between the callback execution and a default handler.[1]
The use of a kfifo simplifies queue processing by providing lock free fifo operations. cxl_cper_kfifo_get() allows easier management of the kfifo between the ghes and cxl modules.
CXL 3.1 Table 8-127 requires a device to have a queue depth of 1 for each of the four event logs. A combined queue depth of 32 is chosen to provide room for 8 entries of each log type.
Add GHES support to detect CXL CPER records. Add the ability for the CXL sub-system to register a work queue to process the events.
This patch adds back the functionality which was removed to fix the report by Dan Carpenter[2].
Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@alien8.de> Suggested-by: Dan Carpenter <dan.carpenter@linaro.org> Suggested-by: Dan Williams <dan.j.williams@intel.com> Link: http://lore.kernel.org/r/cover.1711598777.git.alison.schofield@intel.com [0] Link: http://lore.kernel.org/r/65d111eb87115_6c745294ac@dwillia2-xfh.jf.intel.com.notmuch [1] Link: http://lore.kernel.org/r/b963c490-2c13-4b79-bbe7-34c6568423c7@moroto.mountain [2] Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Tested-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Link: https://lore.kernel.org/r/20240426-cxl-cper3-v4-1-58076cce1624@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
show more ...
|
#
660c0a86 |
| 30-Apr-2024 |
Dave Jiang <dave.jiang@intel.com> |
Merge remote-tracking branch 'cxl/for-6.10/dpa-to-hpa' into cxl-for-next
Support for HPA to DPA translation for CXL events cxl_dram and cxl_general_media.
|
#
6aec0013 |
| 30-Apr-2024 |
Alison Schofield <alison.schofield@intel.com> |
cxl/core: Add region info to cxl_general_media and cxl_dram events
User space may need to know which region, if any, maps the DPAs (device physical addresses) reported in a cxl_general_media or cxl_
cxl/core: Add region info to cxl_general_media and cxl_dram events
User space may need to know which region, if any, maps the DPAs (device physical addresses) reported in a cxl_general_media or cxl_dram event. Since the mapping can change, the kernel provides this information at the time the event occurs. This informs user space that at event <timestamp> this <region> mapped this <DPA> to this <HPA>.
Add the same region info that is included in the cxl_poison trace event: the DPA->HPA translation, region name, and region uuid.
The new fields are inserted in the trace event and no existing fields are modified. If the DPA is not mapped, user will see: hpa=ULLONG_MAX, region="", and uuid=0
This work must be protected by dpa_rwsem & region_rwsem since it is looking up region mappings.
Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/dd8d708b7a7ebfb64a27020a5eb338091336b34d.1714496730.git.alison.schofield@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
show more ...
|
Revision tags: v6.9-rc5 |
|
#
0e081a0e |
| 17-Apr-2024 |
Sangyun Kim <sangyun.kim@snu.ac.kr> |
cxl/cxl-event: include missing <linux/types.h> and <linux/uuid.h>
The linux/cxl-event.h header file uses the u8, u16, and uuid_t types, but it doesn't include the necessary header files, <linux/type
cxl/cxl-event: include missing <linux/types.h> and <linux/uuid.h>
The linux/cxl-event.h header file uses the u8, u16, and uuid_t types, but it doesn't include the necessary header files, <linux/types.h> and <linux/uuid.h>.
Currently, cxl-event.h is only used by drivers/cxl/cxlmem.h, and it doesn't cause any errors because cxlmem.h indirectly includes the required types.
However, cxl-event.h may be used by other CXL-related code in the future, so it's important to fix this issue by including the missing header files directly in cxl-event.h.
Signed-off-by: Sangyun Kim <sangyun.kim@snu.ac.kr> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20240417035043.2791431-1-sangyun.kim@snu.ac.kr Signed-off-by: Dave Jiang <dave.jiang@intel.com>
show more ...
|
Revision tags: v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6 |
|
#
f3e6b3ae |
| 18-Feb-2024 |
Dan Williams <dan.j.williams@intel.com> |
acpi/ghes: Remove CXL CPER notifications
Initial tests with the CXL CPER implementation identified that error reports were being duplicated in the log and the trace event [1]. Then it was discovere
acpi/ghes: Remove CXL CPER notifications
Initial tests with the CXL CPER implementation identified that error reports were being duplicated in the log and the trace event [1]. Then it was discovered that the notification handler took sleeping locks while the GHES event handling runs in spin_lock_irqsave() context [2]
While the duplicate reporting was fixed in v6.8-rc4, the fix for the sleeping-lock-vs-atomic collision would enjoy more time to settle and gain some test cycles. Given how late it is in the development cycle, remove the CXL hookup for now and try again during the next merge window.
Note that end result is that v6.8 does not emit CXL CPER payloads to the kernel log, but this is in line with the CXL trend to move error reporting to trace events instead of the kernel log.
Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: http://lore.kernel.org/r/20240108165855.00002f5a@Huawei.com [1] Closes: http://lore.kernel.org/r/b963c490-2c13-4b79-bbe7-34c6568423c7@moroto.mountain [2] Signed-off-by: Dan Williams <dan.j.williams@intel.com>
show more ...
|
Revision tags: v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7 |
|
#
dc97f634 |
| 21-Dec-2023 |
Ira Weiny <ira.weiny@intel.com> |
cxl/pci: Register for and process CPER events
If the firmware has configured CXL event support to be firmware first the OS can process those events through CPER records. The CXL layer has unique DP
cxl/pci: Register for and process CPER events
If the firmware has configured CXL event support to be firmware first the OS can process those events through CPER records. The CXL layer has unique DPA to HPA knowledge and standard event trace parsing in place.
CPER records contain Bus, Device, Function information which can be used to identify the PCI device which is sending the event.
Change the PCI driver registration to include registration of a CXL CPER callback to process events through the trace subsystem.
Use new scoped based management to simplify the handling of the PCI device object.
Tested-by: Smita-Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Reviewed-by: Smita-Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Link: https://lore.kernel.org/r/20231220-cxl-cper-v5-9-1bb8a4ca2c7a@intel.com Signed-off-by: Ira Weiny <ira.weiny@intel.com> [djbw: use new pci_dev guard, flip init order] Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
show more ...
|
#
671a794c |
| 21-Dec-2023 |
Ira Weiny <ira.weiny@intel.com> |
acpi/ghes: Process CXL Component Events
BIOS can configure memory devices as firmware first. This will send CXL events to the firmware instead of the OS. The firmware can then send these events to
acpi/ghes: Process CXL Component Events
BIOS can configure memory devices as firmware first. This will send CXL events to the firmware instead of the OS. The firmware can then send these events to the OS via UEFI.
UEFI v2.10 section N.2.14 defines a Common Platform Error Record (CPER) format for CXL Component Events. The format is mostly the same as the CXL Common Event Record Format. The difference is the use of a GUID in the Section Type rather than a UUID as part of the event itself.
Add GHES support to detect CXL CPER records and call a registered callback with the event.
A notifier chain was considered for the callback but the complexity did not justify the use case as only the CXL subsystem requires this event. Enforce that only one callback can be registered at any time.
Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Rafael J. Wysocki <rafael@kernel.org> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20231220-cxl-cper-v5-7-1bb8a4ca2c7a@intel.com [djbw: fixup checkpatch errors] Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
show more ...
|
#
f9c68338 |
| 21-Dec-2023 |
Ira Weiny <ira.weiny@intel.com> |
cxl/events: Create a CXL event union
The CXL CPER and event log records share everything but a UUID/GUID in their structures.
Define a cxl_event union without the UUID/GUID to be shared between the
cxl/events: Create a CXL event union
The CXL CPER and event log records share everything but a UUID/GUID in their structures.
Define a cxl_event union without the UUID/GUID to be shared between the CPER and event log record formats. Adjust the code to use this union.
Signed-off-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20231220-cxl-cper-v5-6-1bb8a4ca2c7a@intel.com Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
show more ...
|
#
6eade110 |
| 21-Dec-2023 |
Ira Weiny <ira.weiny@intel.com> |
cxl/events: Separate UUID from event structures
The UEFI CXL CPER structure does not include the UUID. Now that the UUID is passed separately to the trace event there is no need to have the UUID in
cxl/events: Separate UUID from event structures
The UEFI CXL CPER structure does not include the UUID. Now that the UUID is passed separately to the trace event there is no need to have the UUID in those structures.
Move UUID from the event record header to the raw structures. Adjust cxl-test to Create dummy structures for creating test records.
Signed-off-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20231220-cxl-cper-v5-5-1bb8a4ca2c7a@intel.com Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
show more ...
|
#
26a1a86d |
| 21-Dec-2023 |
Ira Weiny <ira.weiny@intel.com> |
cxl/events: Promote CXL event structures to a core header
UEFI code can process CXL events through CPER records. Those records use almost the same format as the CXL events.
Lift the CXL event stru
cxl/events: Promote CXL event structures to a core header
UEFI code can process CXL events through CPER records. Those records use almost the same format as the CXL events.
Lift the CXL event structures to a core header to be shared in later patches.
[jic123: drop "CXL rev 3.0" mention]
Signed-off-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20231220-cxl-cper-v5-2-1bb8a4ca2c7a@intel.com [djbw: add F: entry to maintainers for include/linux/cxl-event.h] Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
show more ...
|