| #
521c1fe0
|
| 13-Jan-2025 |
Robert Clausecker <fuz@FreeBSD.org> |
libc/aarch64: fix strlen() when flush-to-zero is set
Our SIMD-enhanced strlen() implementation for AArch64 uses a floating-point comparison to compare a bit mask to zero. This works fine under norma
libc/aarch64: fix strlen() when flush-to-zero is set
Our SIMD-enhanced strlen() implementation for AArch64 uses a floating-point comparison to compare a bit mask to zero. This works fine under normal circumstances, but fails if the FZ (flush-to-zero) flag is set in FPCR (the floating-point control register) as then the CPU no longer distinguishes denormals from zero.
This was not caught during testing; this flag is rarely set and programs that do so rarely perform string manipulation.
Avoid this problem by using an integer comparison instead. The performance impact seems to be small (about 0.5 %) on the Windows 2023 Dev Kit, but seems to be more significant (up to around 19%) on the RPi 5.
Reviewed by: getz Fixes: 3863fec1ce2dc6033f094a085118605ea89db9e2 Differential Revision: https://reviews.freebsd.org/D48442
show more ...
|
| #
3863fec1
|
| 26-Aug-2024 |
Getz Mikalsen <getz@FreeBSD.org> |
lib/libc/aarch64/string: add strlen SIMD implementation
Adds a SIMD enhanced strlen for Aarch64. It takes inspiration from the amd64 implementation but I struggled getting the performance I had hope
lib/libc/aarch64/string: add strlen SIMD implementation
Adds a SIMD enhanced strlen for Aarch64. It takes inspiration from the amd64 implementation but I struggled getting the performance I had hoped for on cores like the Graviton3 when compared to the existing implementation from Arm Optimized Routines.
See the DR for bechmark results.
Tested by: fuz (exprun) Reviewed by: fuz, emaste Sponsored by: Google LLC (GSoC 2024) PR: 281175 Differential Revision: https://reviews.freebsd.org/D45623
show more ...
|
| #
521c1fe0
|
| 13-Jan-2025 |
Robert Clausecker <fuz@FreeBSD.org> |
libc/aarch64: fix strlen() when flush-to-zero is set
Our SIMD-enhanced strlen() implementation for AArch64 uses a floating-point comparison to compare a bit mask to zero. This works fine under norma
libc/aarch64: fix strlen() when flush-to-zero is set
Our SIMD-enhanced strlen() implementation for AArch64 uses a floating-point comparison to compare a bit mask to zero. This works fine under normal circumstances, but fails if the FZ (flush-to-zero) flag is set in FPCR (the floating-point control register) as then the CPU no longer distinguishes denormals from zero.
This was not caught during testing; this flag is rarely set and programs that do so rarely perform string manipulation.
Avoid this problem by using an integer comparison instead. The performance impact seems to be small (about 0.5 %) on the Windows 2023 Dev Kit, but seems to be more significant (up to around 19%) on the RPi 5.
Reviewed by: getz Fixes: 3863fec1ce2dc6033f094a085118605ea89db9e2 Differential Revision: https://reviews.freebsd.org/D48442
show more ...
|
| #
3863fec1
|
| 26-Aug-2024 |
Getz Mikalsen <getz@FreeBSD.org> |
lib/libc/aarch64/string: add strlen SIMD implementation
Adds a SIMD enhanced strlen for Aarch64. It takes inspiration from the amd64 implementation but I struggled getting the performance I had hope
lib/libc/aarch64/string: add strlen SIMD implementation
Adds a SIMD enhanced strlen for Aarch64. It takes inspiration from the amd64 implementation but I struggled getting the performance I had hoped for on cores like the Graviton3 when compared to the existing implementation from Arm Optimized Routines.
See the DR for bechmark results.
Tested by: fuz (exprun) Reviewed by: fuz, emaste Sponsored by: Google LLC (GSoC 2024) PR: 281175 Differential Revision: https://reviews.freebsd.org/D45623
show more ...
|
| #
521c1fe0
|
| 13-Jan-2025 |
Robert Clausecker <fuz@FreeBSD.org> |
libc/aarch64: fix strlen() when flush-to-zero is set
Our SIMD-enhanced strlen() implementation for AArch64 uses a floating-point comparison to compare a bit mask to zero. This works fine under norma
libc/aarch64: fix strlen() when flush-to-zero is set
Our SIMD-enhanced strlen() implementation for AArch64 uses a floating-point comparison to compare a bit mask to zero. This works fine under normal circumstances, but fails if the FZ (flush-to-zero) flag is set in FPCR (the floating-point control register) as then the CPU no longer distinguishes denormals from zero.
This was not caught during testing; this flag is rarely set and programs that do so rarely perform string manipulation.
Avoid this problem by using an integer comparison instead. The performance impact seems to be small (about 0.5 %) on the Windows 2023 Dev Kit, but seems to be more significant (up to around 19%) on the RPi 5.
Reviewed by: getz Fixes: 3863fec1ce2dc6033f094a085118605ea89db9e2 Differential Revision: https://reviews.freebsd.org/D48442
show more ...
|
| #
3863fec1
|
| 26-Aug-2024 |
Getz Mikalsen <getz@FreeBSD.org> |
lib/libc/aarch64/string: add strlen SIMD implementation
Adds a SIMD enhanced strlen for Aarch64. It takes inspiration from the amd64 implementation but I struggled getting the performance I had hope
lib/libc/aarch64/string: add strlen SIMD implementation
Adds a SIMD enhanced strlen for Aarch64. It takes inspiration from the amd64 implementation but I struggled getting the performance I had hoped for on cores like the Graviton3 when compared to the existing implementation from Arm Optimized Routines.
See the DR for bechmark results.
Tested by: fuz (exprun) Reviewed by: fuz, emaste Sponsored by: Google LLC (GSoC 2024) PR: 281175 Differential Revision: https://reviews.freebsd.org/D45623
show more ...
|