| #
2f833192
|
| 18-Dec-2025 |
Robert Clausecker <fuz@FreeBSD.org> |
libc/amd64: fix stpncpy.S again
The previous fix introduced a regression on machines without the BMI1 instruction set extension. The TZCNT instruction used in this function behaves different on old
libc/amd64: fix stpncpy.S again
The previous fix introduced a regression on machines without the BMI1 instruction set extension. The TZCNT instruction used in this function behaves different on old machines when the source operand is zero, but the code was originally designed to never trigger this case. The bug fix caused this case to be possible, leading to a regression on sufficiently old hardware.
Fix the code by messing with things such that the source operand is never zero.
PR: 291720 Fixes: 66eb78377bf109af1d9e25626bf254b4369436ec Tested by: cy Approved by: markj (mentor) Differential Revision: https://reviews.freebsd.org/D54303
show more ...
|
| #
ce9557d4
|
| 16-Dec-2025 |
Dag-Erling Smørgrav <des@FreeBSD.org> |
libc/amd64: Disable baseline version of stpncpy()
This implementation appears to be broken on some CPUs. Disable it until the issue can be investigated and fixed.
PR: 291720 Fixes: 66eb78377bf1
libc/amd64: Disable baseline version of stpncpy()
This implementation appears to be broken on some CPUs. Disable it until the issue can be investigated and fixed.
PR: 291720 Fixes: 66eb78377bf1 ("libc/amd64: fix overread conditions in stpncpy()") Fixes: 90253d49db09 ("lib/libc/amd64/string: add stpncpy scalar, baseline implementation")
show more ...
|
| #
66eb7837
|
| 10-Dec-2025 |
Robert Clausecker <fuz@FreeBSD.org> |
libc/amd64: fix overread conditions in stpncpy()
Due to incorrect unit test design, two overread conditions went undetected in the amd64 baseline stpncpy() implementation. For buffers of 1--16 and 3
libc/amd64: fix overread conditions in stpncpy()
Due to incorrect unit test design, two overread conditions went undetected in the amd64 baseline stpncpy() implementation. For buffers of 1--16 and 32 bytes that do not contain nul bytes and end exactly at a page boundary, the code would incorrectly read 16 bytes from the next page, possibly crossing into an unmapped page and crashing the program. If the next page was mapped, the code would then proceed with the expected behaviour of the stpncpy() function.
Three changes were made to fix the bug:
- an off-by-one error is fixed in the code deciding whether to enter the runt case or not, entering it for 0<n<=32 bytes instead of 0<n<32 bytes as it was before. - in the runt case, the logic to skip reading a second 16-byte chunk if the buffer ends in the first chunk was fixed to account for buffers that end at a 16-byte boundary but do not hold a nul byte. - in the runt case, the logic to transform the location of the end of the input buffer into a bit mask was fixed to allow the case of n==32, which was previously impossible due to the incorrect logic for entering said case.
The performance impact should be minimal.
PR: 291359 See also: D54169 Reported by: Collin Funk <collin.funk1@gmail.com> Reviewed by: getz Approved by: markj (mentor) MFC after: 1 week Fixes: 90253d49db09a9b1490c448d05314f3e4bbfa468 (D42519) Differential Revision: https://reviews.freebsd.org/D54170
show more ...
|
| #
90253d49
|
| 30-Oct-2023 |
Robert Clausecker <fuz@FreeBSD.org> |
lib/libc/amd64/string: add stpncpy scalar, baseline implementation
This was surprisingly annoying to get right, despite being such a simple function. A scalar implementation is also provided, it ju
lib/libc/amd64/string: add stpncpy scalar, baseline implementation
This was surprisingly annoying to get right, despite being such a simple function. A scalar implementation is also provided, it just calls into our optimised memchr(), memcpy(), and memset() routines to carry out its job.
I'm quite happy with the performance. glibc only beats us for very long strings, likely due to the use of AVX-512. The scalar implementation just calls into our optimised memchr(), memcpy(), and memset() routines, so it has a high overhead to begin with but then performs ok for the amount of effort that went into it. Still beats the old C code, except for very short strings.
Sponsored by: The FreeBSD Foundation Tested by: developers@, exp-run Approved by: mjg MFC after: 1 month MFC to: stable/14 PR: 275785 Differential Revision: https://reviews.freebsd.org/D42519
show more ...
|
| #
2f833192
|
| 18-Dec-2025 |
Robert Clausecker <fuz@FreeBSD.org> |
libc/amd64: fix stpncpy.S again
The previous fix introduced a regression on machines without the BMI1 instruction set extension. The TZCNT instruction used in this function behaves different on old
libc/amd64: fix stpncpy.S again
The previous fix introduced a regression on machines without the BMI1 instruction set extension. The TZCNT instruction used in this function behaves different on old machines when the source operand is zero, but the code was originally designed to never trigger this case. The bug fix caused this case to be possible, leading to a regression on sufficiently old hardware.
Fix the code by messing with things such that the source operand is never zero.
PR: 291720 Fixes: 66eb78377bf109af1d9e25626bf254b4369436ec Tested by: cy Approved by: markj (mentor) Differential Revision: https://reviews.freebsd.org/D54303
show more ...
|
| #
ce9557d4
|
| 16-Dec-2025 |
Dag-Erling Smørgrav <des@FreeBSD.org> |
libc/amd64: Disable baseline version of stpncpy()
This implementation appears to be broken on some CPUs. Disable it until the issue can be investigated and fixed.
PR: 291720 Fixes: 66eb78377bf1
libc/amd64: Disable baseline version of stpncpy()
This implementation appears to be broken on some CPUs. Disable it until the issue can be investigated and fixed.
PR: 291720 Fixes: 66eb78377bf1 ("libc/amd64: fix overread conditions in stpncpy()") Fixes: 90253d49db09 ("lib/libc/amd64/string: add stpncpy scalar, baseline implementation")
show more ...
|
| #
66eb7837
|
| 10-Dec-2025 |
Robert Clausecker <fuz@FreeBSD.org> |
libc/amd64: fix overread conditions in stpncpy()
Due to incorrect unit test design, two overread conditions went undetected in the amd64 baseline stpncpy() implementation. For buffers of 1--16 and 3
libc/amd64: fix overread conditions in stpncpy()
Due to incorrect unit test design, two overread conditions went undetected in the amd64 baseline stpncpy() implementation. For buffers of 1--16 and 32 bytes that do not contain nul bytes and end exactly at a page boundary, the code would incorrectly read 16 bytes from the next page, possibly crossing into an unmapped page and crashing the program. If the next page was mapped, the code would then proceed with the expected behaviour of the stpncpy() function.
Three changes were made to fix the bug:
- an off-by-one error is fixed in the code deciding whether to enter the runt case or not, entering it for 0<n<=32 bytes instead of 0<n<32 bytes as it was before. - in the runt case, the logic to skip reading a second 16-byte chunk if the buffer ends in the first chunk was fixed to account for buffers that end at a 16-byte boundary but do not hold a nul byte. - in the runt case, the logic to transform the location of the end of the input buffer into a bit mask was fixed to allow the case of n==32, which was previously impossible due to the incorrect logic for entering said case.
The performance impact should be minimal.
PR: 291359 See also: D54169 Reported by: Collin Funk <collin.funk1@gmail.com> Reviewed by: getz Approved by: markj (mentor) MFC after: 1 week Fixes: 90253d49db09a9b1490c448d05314f3e4bbfa468 (D42519) Differential Revision: https://reviews.freebsd.org/D54170
show more ...
|
| #
90253d49
|
| 30-Oct-2023 |
Robert Clausecker <fuz@FreeBSD.org> |
lib/libc/amd64/string: add stpncpy scalar, baseline implementation
This was surprisingly annoying to get right, despite being such a simple function. A scalar implementation is also provided, it ju
lib/libc/amd64/string: add stpncpy scalar, baseline implementation
This was surprisingly annoying to get right, despite being such a simple function. A scalar implementation is also provided, it just calls into our optimised memchr(), memcpy(), and memset() routines to carry out its job.
I'm quite happy with the performance. glibc only beats us for very long strings, likely due to the use of AVX-512. The scalar implementation just calls into our optimised memchr(), memcpy(), and memset() routines, so it has a high overhead to begin with but then performs ok for the amount of effort that went into it. Still beats the old C code, except for very short strings.
Sponsored by: The FreeBSD Foundation Tested by: developers@, exp-run Approved by: mjg MFC after: 1 month MFC to: stable/14 PR: 275785 Differential Revision: https://reviews.freebsd.org/D42519
show more ...
|
| #
2f833192
|
| 18-Dec-2025 |
Robert Clausecker <fuz@FreeBSD.org> |
libc/amd64: fix stpncpy.S again
The previous fix introduced a regression on machines without the BMI1 instruction set extension. The TZCNT instruction used in this function behaves different on old
libc/amd64: fix stpncpy.S again
The previous fix introduced a regression on machines without the BMI1 instruction set extension. The TZCNT instruction used in this function behaves different on old machines when the source operand is zero, but the code was originally designed to never trigger this case. The bug fix caused this case to be possible, leading to a regression on sufficiently old hardware.
Fix the code by messing with things such that the source operand is never zero.
PR: 291720 Fixes: 66eb78377bf109af1d9e25626bf254b4369436ec Tested by: cy Approved by: markj (mentor) Differential Revision: https://reviews.freebsd.org/D54303
show more ...
|
| #
ce9557d4
|
| 16-Dec-2025 |
Dag-Erling Smørgrav <des@FreeBSD.org> |
libc/amd64: Disable baseline version of stpncpy()
This implementation appears to be broken on some CPUs. Disable it until the issue can be investigated and fixed.
PR: 291720 Fixes: 66eb78377bf1
libc/amd64: Disable baseline version of stpncpy()
This implementation appears to be broken on some CPUs. Disable it until the issue can be investigated and fixed.
PR: 291720 Fixes: 66eb78377bf1 ("libc/amd64: fix overread conditions in stpncpy()") Fixes: 90253d49db09 ("lib/libc/amd64/string: add stpncpy scalar, baseline implementation")
show more ...
|
| #
66eb7837
|
| 10-Dec-2025 |
Robert Clausecker <fuz@FreeBSD.org> |
libc/amd64: fix overread conditions in stpncpy()
Due to incorrect unit test design, two overread conditions went undetected in the amd64 baseline stpncpy() implementation. For buffers of 1--16 and 3
libc/amd64: fix overread conditions in stpncpy()
Due to incorrect unit test design, two overread conditions went undetected in the amd64 baseline stpncpy() implementation. For buffers of 1--16 and 32 bytes that do not contain nul bytes and end exactly at a page boundary, the code would incorrectly read 16 bytes from the next page, possibly crossing into an unmapped page and crashing the program. If the next page was mapped, the code would then proceed with the expected behaviour of the stpncpy() function.
Three changes were made to fix the bug:
- an off-by-one error is fixed in the code deciding whether to enter the runt case or not, entering it for 0<n<=32 bytes instead of 0<n<32 bytes as it was before. - in the runt case, the logic to skip reading a second 16-byte chunk if the buffer ends in the first chunk was fixed to account for buffers that end at a 16-byte boundary but do not hold a nul byte. - in the runt case, the logic to transform the location of the end of the input buffer into a bit mask was fixed to allow the case of n==32, which was previously impossible due to the incorrect logic for entering said case.
The performance impact should be minimal.
PR: 291359 See also: D54169 Reported by: Collin Funk <collin.funk1@gmail.com> Reviewed by: getz Approved by: markj (mentor) MFC after: 1 week Fixes: 90253d49db09a9b1490c448d05314f3e4bbfa468 (D42519) Differential Revision: https://reviews.freebsd.org/D54170
show more ...
|
| #
90253d49
|
| 30-Oct-2023 |
Robert Clausecker <fuz@FreeBSD.org> |
lib/libc/amd64/string: add stpncpy scalar, baseline implementation
This was surprisingly annoying to get right, despite being such a simple function. A scalar implementation is also provided, it ju
lib/libc/amd64/string: add stpncpy scalar, baseline implementation
This was surprisingly annoying to get right, despite being such a simple function. A scalar implementation is also provided, it just calls into our optimised memchr(), memcpy(), and memset() routines to carry out its job.
I'm quite happy with the performance. glibc only beats us for very long strings, likely due to the use of AVX-512. The scalar implementation just calls into our optimised memchr(), memcpy(), and memset() routines, so it has a high overhead to begin with but then performs ok for the amount of effort that went into it. Still beats the old C code, except for very short strings.
Sponsored by: The FreeBSD Foundation Tested by: developers@, exp-run Approved by: mjg MFC after: 1 month MFC to: stable/14 PR: 275785 Differential Revision: https://reviews.freebsd.org/D42519
show more ...
|