History log of /src/lib/libc/amd64/string/stpncpy.S (Results 1 – 12 of 12)
Revision Date Author Comments
# 2f833192 18-Dec-2025 Robert Clausecker <fuz@FreeBSD.org>

libc/amd64: fix stpncpy.S again

The previous fix introduced a regression on machines without the BMI1
instruction set extension. The TZCNT instruction used in this function
behaves different on old

libc/amd64: fix stpncpy.S again

The previous fix introduced a regression on machines without the BMI1
instruction set extension. The TZCNT instruction used in this function
behaves different on old machines when the source operand is zero, but
the code was originally designed to never trigger this case. The bug
fix caused this case to be possible, leading to a regression on
sufficiently old hardware.

Fix the code by messing with things such that the source operand is
never zero.

PR: 291720
Fixes: 66eb78377bf109af1d9e25626bf254b4369436ec
Tested by: cy
Approved by: markj (mentor)
Differential Revision: https://reviews.freebsd.org/D54303

show more ...


# ce9557d4 16-Dec-2025 Dag-Erling Smørgrav <des@FreeBSD.org>

libc/amd64: Disable baseline version of stpncpy()

This implementation appears to be broken on some CPUs. Disable it
until the issue can be investigated and fixed.

PR: 291720
Fixes: 66eb78377bf1

libc/amd64: Disable baseline version of stpncpy()

This implementation appears to be broken on some CPUs. Disable it
until the issue can be investigated and fixed.

PR: 291720
Fixes: 66eb78377bf1 ("libc/amd64: fix overread conditions in stpncpy()")
Fixes: 90253d49db09 ("lib/libc/amd64/string: add stpncpy scalar, baseline implementation")

show more ...


# 66eb7837 10-Dec-2025 Robert Clausecker <fuz@FreeBSD.org>

libc/amd64: fix overread conditions in stpncpy()

Due to incorrect unit test design, two overread conditions went
undetected in the amd64 baseline stpncpy() implementation.
For buffers of 1--16 and 3

libc/amd64: fix overread conditions in stpncpy()

Due to incorrect unit test design, two overread conditions went
undetected in the amd64 baseline stpncpy() implementation.
For buffers of 1--16 and 32 bytes that do not contain nul bytes
and end exactly at a page boundary, the code would incorrectly
read 16 bytes from the next page, possibly crossing into an
unmapped page and crashing the program. If the next page was
mapped, the code would then proceed with the expected behaviour
of the stpncpy() function.

Three changes were made to fix the bug:

- an off-by-one error is fixed in the code deciding whether to
enter the runt case or not, entering it for 0<n<=32 bytes
instead of 0<n<32 bytes as it was before.
- in the runt case, the logic to skip reading a second 16-byte
chunk if the buffer ends in the first chunk was fixed to
account for buffers that end at a 16-byte boundary but do not
hold a nul byte.
- in the runt case, the logic to transform the location of the
end of the input buffer into a bit mask was fixed to allow
the case of n==32, which was previously impossible due to the
incorrect logic for entering said case.

The performance impact should be minimal.

PR: 291359
See also: D54169
Reported by: Collin Funk <collin.funk1@gmail.com>
Reviewed by: getz
Approved by: markj (mentor)
MFC after: 1 week
Fixes: 90253d49db09a9b1490c448d05314f3e4bbfa468 (D42519)
Differential Revision: https://reviews.freebsd.org/D54170

show more ...


# 90253d49 30-Oct-2023 Robert Clausecker <fuz@FreeBSD.org>

lib/libc/amd64/string: add stpncpy scalar, baseline implementation

This was surprisingly annoying to get right, despite being such a simple
function. A scalar implementation is also provided, it ju

lib/libc/amd64/string: add stpncpy scalar, baseline implementation

This was surprisingly annoying to get right, despite being such a simple
function. A scalar implementation is also provided, it just calls into
our optimised memchr(), memcpy(), and memset() routines to carry out its
job.

I'm quite happy with the performance. glibc only beats us for very long
strings, likely due to the use of AVX-512. The scalar implementation
just calls into our optimised memchr(), memcpy(), and memset() routines,
so it has a high overhead to begin with but then performs ok for the
amount of effort that went into it. Still beats the old C code, except
for very short strings.

Sponsored by: The FreeBSD Foundation
Tested by: developers@, exp-run
Approved by: mjg
MFC after: 1 month
MFC to: stable/14
PR: 275785
Differential Revision: https://reviews.freebsd.org/D42519

show more ...


# 2f833192 18-Dec-2025 Robert Clausecker <fuz@FreeBSD.org>

libc/amd64: fix stpncpy.S again

The previous fix introduced a regression on machines without the BMI1
instruction set extension. The TZCNT instruction used in this function
behaves different on old

libc/amd64: fix stpncpy.S again

The previous fix introduced a regression on machines without the BMI1
instruction set extension. The TZCNT instruction used in this function
behaves different on old machines when the source operand is zero, but
the code was originally designed to never trigger this case. The bug
fix caused this case to be possible, leading to a regression on
sufficiently old hardware.

Fix the code by messing with things such that the source operand is
never zero.

PR: 291720
Fixes: 66eb78377bf109af1d9e25626bf254b4369436ec
Tested by: cy
Approved by: markj (mentor)
Differential Revision: https://reviews.freebsd.org/D54303

show more ...


# ce9557d4 16-Dec-2025 Dag-Erling Smørgrav <des@FreeBSD.org>

libc/amd64: Disable baseline version of stpncpy()

This implementation appears to be broken on some CPUs. Disable it
until the issue can be investigated and fixed.

PR: 291720
Fixes: 66eb78377bf1

libc/amd64: Disable baseline version of stpncpy()

This implementation appears to be broken on some CPUs. Disable it
until the issue can be investigated and fixed.

PR: 291720
Fixes: 66eb78377bf1 ("libc/amd64: fix overread conditions in stpncpy()")
Fixes: 90253d49db09 ("lib/libc/amd64/string: add stpncpy scalar, baseline implementation")

show more ...


# 66eb7837 10-Dec-2025 Robert Clausecker <fuz@FreeBSD.org>

libc/amd64: fix overread conditions in stpncpy()

Due to incorrect unit test design, two overread conditions went
undetected in the amd64 baseline stpncpy() implementation.
For buffers of 1--16 and 3

libc/amd64: fix overread conditions in stpncpy()

Due to incorrect unit test design, two overread conditions went
undetected in the amd64 baseline stpncpy() implementation.
For buffers of 1--16 and 32 bytes that do not contain nul bytes
and end exactly at a page boundary, the code would incorrectly
read 16 bytes from the next page, possibly crossing into an
unmapped page and crashing the program. If the next page was
mapped, the code would then proceed with the expected behaviour
of the stpncpy() function.

Three changes were made to fix the bug:

- an off-by-one error is fixed in the code deciding whether to
enter the runt case or not, entering it for 0<n<=32 bytes
instead of 0<n<32 bytes as it was before.
- in the runt case, the logic to skip reading a second 16-byte
chunk if the buffer ends in the first chunk was fixed to
account for buffers that end at a 16-byte boundary but do not
hold a nul byte.
- in the runt case, the logic to transform the location of the
end of the input buffer into a bit mask was fixed to allow
the case of n==32, which was previously impossible due to the
incorrect logic for entering said case.

The performance impact should be minimal.

PR: 291359
See also: D54169
Reported by: Collin Funk <collin.funk1@gmail.com>
Reviewed by: getz
Approved by: markj (mentor)
MFC after: 1 week
Fixes: 90253d49db09a9b1490c448d05314f3e4bbfa468 (D42519)
Differential Revision: https://reviews.freebsd.org/D54170

show more ...


# 90253d49 30-Oct-2023 Robert Clausecker <fuz@FreeBSD.org>

lib/libc/amd64/string: add stpncpy scalar, baseline implementation

This was surprisingly annoying to get right, despite being such a simple
function. A scalar implementation is also provided, it ju

lib/libc/amd64/string: add stpncpy scalar, baseline implementation

This was surprisingly annoying to get right, despite being such a simple
function. A scalar implementation is also provided, it just calls into
our optimised memchr(), memcpy(), and memset() routines to carry out its
job.

I'm quite happy with the performance. glibc only beats us for very long
strings, likely due to the use of AVX-512. The scalar implementation
just calls into our optimised memchr(), memcpy(), and memset() routines,
so it has a high overhead to begin with but then performs ok for the
amount of effort that went into it. Still beats the old C code, except
for very short strings.

Sponsored by: The FreeBSD Foundation
Tested by: developers@, exp-run
Approved by: mjg
MFC after: 1 month
MFC to: stable/14
PR: 275785
Differential Revision: https://reviews.freebsd.org/D42519

show more ...


# 2f833192 18-Dec-2025 Robert Clausecker <fuz@FreeBSD.org>

libc/amd64: fix stpncpy.S again

The previous fix introduced a regression on machines without the BMI1
instruction set extension. The TZCNT instruction used in this function
behaves different on old

libc/amd64: fix stpncpy.S again

The previous fix introduced a regression on machines without the BMI1
instruction set extension. The TZCNT instruction used in this function
behaves different on old machines when the source operand is zero, but
the code was originally designed to never trigger this case. The bug
fix caused this case to be possible, leading to a regression on
sufficiently old hardware.

Fix the code by messing with things such that the source operand is
never zero.

PR: 291720
Fixes: 66eb78377bf109af1d9e25626bf254b4369436ec
Tested by: cy
Approved by: markj (mentor)
Differential Revision: https://reviews.freebsd.org/D54303

show more ...


# ce9557d4 16-Dec-2025 Dag-Erling Smørgrav <des@FreeBSD.org>

libc/amd64: Disable baseline version of stpncpy()

This implementation appears to be broken on some CPUs. Disable it
until the issue can be investigated and fixed.

PR: 291720
Fixes: 66eb78377bf1

libc/amd64: Disable baseline version of stpncpy()

This implementation appears to be broken on some CPUs. Disable it
until the issue can be investigated and fixed.

PR: 291720
Fixes: 66eb78377bf1 ("libc/amd64: fix overread conditions in stpncpy()")
Fixes: 90253d49db09 ("lib/libc/amd64/string: add stpncpy scalar, baseline implementation")

show more ...


# 66eb7837 10-Dec-2025 Robert Clausecker <fuz@FreeBSD.org>

libc/amd64: fix overread conditions in stpncpy()

Due to incorrect unit test design, two overread conditions went
undetected in the amd64 baseline stpncpy() implementation.
For buffers of 1--16 and 3

libc/amd64: fix overread conditions in stpncpy()

Due to incorrect unit test design, two overread conditions went
undetected in the amd64 baseline stpncpy() implementation.
For buffers of 1--16 and 32 bytes that do not contain nul bytes
and end exactly at a page boundary, the code would incorrectly
read 16 bytes from the next page, possibly crossing into an
unmapped page and crashing the program. If the next page was
mapped, the code would then proceed with the expected behaviour
of the stpncpy() function.

Three changes were made to fix the bug:

- an off-by-one error is fixed in the code deciding whether to
enter the runt case or not, entering it for 0<n<=32 bytes
instead of 0<n<32 bytes as it was before.
- in the runt case, the logic to skip reading a second 16-byte
chunk if the buffer ends in the first chunk was fixed to
account for buffers that end at a 16-byte boundary but do not
hold a nul byte.
- in the runt case, the logic to transform the location of the
end of the input buffer into a bit mask was fixed to allow
the case of n==32, which was previously impossible due to the
incorrect logic for entering said case.

The performance impact should be minimal.

PR: 291359
See also: D54169
Reported by: Collin Funk <collin.funk1@gmail.com>
Reviewed by: getz
Approved by: markj (mentor)
MFC after: 1 week
Fixes: 90253d49db09a9b1490c448d05314f3e4bbfa468 (D42519)
Differential Revision: https://reviews.freebsd.org/D54170

show more ...


# 90253d49 30-Oct-2023 Robert Clausecker <fuz@FreeBSD.org>

lib/libc/amd64/string: add stpncpy scalar, baseline implementation

This was surprisingly annoying to get right, despite being such a simple
function. A scalar implementation is also provided, it ju

lib/libc/amd64/string: add stpncpy scalar, baseline implementation

This was surprisingly annoying to get right, despite being such a simple
function. A scalar implementation is also provided, it just calls into
our optimised memchr(), memcpy(), and memset() routines to carry out its
job.

I'm quite happy with the performance. glibc only beats us for very long
strings, likely due to the use of AVX-512. The scalar implementation
just calls into our optimised memchr(), memcpy(), and memset() routines,
so it has a high overhead to begin with but then performs ok for the
amount of effort that went into it. Still beats the old C code, except
for very short strings.

Sponsored by: The FreeBSD Foundation
Tested by: developers@, exp-run
Approved by: mjg
MFC after: 1 month
MFC to: stable/14
PR: 275785
Differential Revision: https://reviews.freebsd.org/D42519

show more ...