#
0a91336e |
| 02-Aug-2025 |
Huacai Chen <chenhuacai@loongson.cn> |
Merge tag 'bpf-next-6.17' into loongarch-next
LoongArch architecture changes for 6.17 have many bpf features such as trampoline, so merge 'bpf-next-6.17' to create a base to make bpf work well.
|
#
d9104cec |
| 30-Jul-2025 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'bpf-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Pull bpf updates from Alexei Starovoitov:
- Remove usermode driver (UMD) framework (Thomas Weißschuh)
- In
Merge tag 'bpf-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Pull bpf updates from Alexei Starovoitov:
- Remove usermode driver (UMD) framework (Thomas Weißschuh)
- Introduce Strongly Connected Component (SCC) in the verifier to detect loops and refine register liveness (Eduard Zingerman)
- Allow 'void *' cast using bpf_rdonly_cast() and corresponding '__arg_untrusted' for global function parameters (Eduard Zingerman)
- Improve precision for BPF_ADD and BPF_SUB operations in the verifier (Harishankar Vishwanathan)
- Teach the verifier that constant pointer to a map cannot be NULL (Ihor Solodrai)
- Introduce BPF streams for error reporting of various conditions detected by BPF runtime (Kumar Kartikeya Dwivedi)
- Teach the verifier to insert runtime speculation barrier (lfence on x86) to mitigate speculative execution instead of rejecting the programs (Luis Gerhorst)
- Various improvements for 'veristat' (Mykyta Yatsenko)
- For CONFIG_DEBUG_KERNEL config warn on internal verifier errors to improve bug detection by syzbot (Paul Chaignon)
- Support BPF private stack on arm64 (Puranjay Mohan)
- Introduce bpf_cgroup_read_xattr() kfunc to read xattr of cgroup's node (Song Liu)
- Introduce kfuncs for read-only string opreations (Viktor Malik)
- Implement show_fdinfo() for bpf_links (Tao Chen)
- Reduce verifier's stack consumption (Yonghong Song)
- Implement mprog API for cgroup-bpf programs (Yonghong Song)
* tag 'bpf-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (192 commits) selftests/bpf: Migrate fexit_noreturns case into tracing_failure test suite selftests/bpf: Add selftest for attaching tracing programs to functions in deny list bpf: Add log for attaching tracing programs to functions in deny list bpf: Show precise rejected function when attaching fexit/fmod_ret to __noreturn functions bpf: Fix various typos in verifier.c comments bpf: Add third round of bounds deduction selftests/bpf: Test invariants on JSLT crossing sign selftests/bpf: Test cross-sign 64bits range refinement selftests/bpf: Update reg_bound range refinement logic bpf: Improve bounds when s64 crosses sign boundary bpf: Simplify bounds refinement from s32 selftests/bpf: Enable private stack tests for arm64 bpf, arm64: JIT support for private stack bpf: Move bpf_jit_get_prog_name() to core.c bpf, arm64: Fix fp initialization for exception boundary umd: Remove usermode driver framework bpf/preload: Don't select USERMODE_DRIVER selftests/bpf: Fix test dynptr/test_dynptr_memset_xdp_chunks failure selftests/bpf: Fix test dynptr/test_dynptr_copy_xdp failure selftests/bpf: Increase xdp data size for arm64 64K page size ...
show more ...
|
#
a9f8d8ad |
| 28-Jul-2025 |
Alexei Starovoitov <ast@kernel.org> |
Merge branch 'bpf-improve-64bits-bounds-refinement'
Paul Chaignon says:
==================== bpf: Improve 64bits bounds refinement
This patchset improves the 64bits bounds refinement when the s64
Merge branch 'bpf-improve-64bits-bounds-refinement'
Paul Chaignon says:
==================== bpf: Improve 64bits bounds refinement
This patchset improves the 64bits bounds refinement when the s64 ranges crosses the sign boundary. The first patch explains the small addition to __reg64_deduce_bounds. The last one explains why we need a third round of __reg_deduce_bounds. The third patch adds a selftest with a more complete example of the impact on verification. The second and fourth patches update the existing selftests to take the new refinement into account.
This patchset should reduce the number of kernel warnings hit by syzkaller due to invariant violations [1]. It was also tested with Agni [2] (and Cilium's CI for good measure).
Link: https://syzkaller.appspot.com/bug?extid=c711ce17dd78e5d4fdcf [1] Link: https://github.com/bpfverif/agni [2]
Changes in v4: - Fixed outdated test comment, noticed by Eduard. - Rebased. Changes in v3: - Added a 5th patch to call __reg_deduce_bounds a third time in reg_bounds_sync following tests from Eduard. - Fixed broken indentations in the first patch. Changes in v2 (all on Eduard's suggestions): - Added two tests to ensure we cover all cases of u64/s64 overlap. - Improved tests to check deduced ranges with __msg. - Improved code comments. ====================
Link: https://patch.msgid.link/cover.1753695655.git.paul.chaignon@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
5dbb19b1 |
| 28-Jul-2025 |
Paul Chaignon <paul.chaignon@gmail.com> |
bpf: Add third round of bounds deduction
Commit d7f008738171 ("bpf: try harder to deduce register bounds from different numeric domains") added a second call to __reg_deduce_bounds in reg_bounds_syn
bpf: Add third round of bounds deduction
Commit d7f008738171 ("bpf: try harder to deduce register bounds from different numeric domains") added a second call to __reg_deduce_bounds in reg_bounds_sync because a single call wasn't enough to converge to a fixed point in terms of register bounds.
With patch "bpf: Improve bounds when s64 crosses sign boundary" from this series, Eduard noticed that calling __reg_deduce_bounds twice isn't enough anymore to converge. The first selftest added in "selftests/bpf: Test cross-sign 64bits range refinement" highlights the need for a third call to __reg_deduce_bounds. After instruction 7, reg_bounds_sync performs the following bounds deduction:
reg_bounds_sync entry: scalar(smin=-655,smax=0xeffffeee,smin32=-783,smax32=-146) __update_reg_bounds: scalar(smin=-655,smax=0xeffffeee,smin32=-783,smax32=-146) __reg_deduce_bounds: __reg32_deduce_bounds: scalar(smin=-655,smax=0xeffffeee,smin32=-783,smax32=-146,umin32=0xfffffcf1,umax32=0xffffff6e) __reg64_deduce_bounds: scalar(smin=-655,smax=0xeffffeee,smin32=-783,smax32=-146,umin32=0xfffffcf1,umax32=0xffffff6e) __reg_deduce_mixed_bounds: scalar(smin=-655,smax=0xeffffeee,umin=umin32=0xfffffcf1,umax=0xffffffffffffff6e,smin32=-783,smax32=-146,umax32=0xffffff6e) __reg_deduce_bounds: __reg32_deduce_bounds: scalar(smin=-655,smax=0xeffffeee,umin=umin32=0xfffffcf1,umax=0xffffffffffffff6e,smin32=-783,smax32=-146,umax32=0xffffff6e) __reg64_deduce_bounds: scalar(smin=-655,smax=smax32=-146,umin=0xfffffffffffffd71,umax=0xffffffffffffff6e,smin32=-783,umin32=0xfffffcf1,umax32=0xffffff6e) __reg_deduce_mixed_bounds: scalar(smin=-655,smax=smax32=-146,umin=0xfffffffffffffd71,umax=0xffffffffffffff6e,smin32=-783,umin32=0xfffffcf1,umax32=0xffffff6e) __reg_bound_offset: scalar(smin=-655,smax=smax32=-146,umin=0xfffffffffffffd71,umax=0xffffffffffffff6e,smin32=-783,umin32=0xfffffcf1,umax32=0xffffff6e,var_off=(0xfffffffffffffc00; 0x3ff)) __update_reg_bounds: scalar(smin=-655,smax=smax32=-146,umin=0xfffffffffffffd71,umax=0xffffffffffffff6e,smin32=-783,umin32=0xfffffcf1,umax32=0xffffff6e,var_off=(0xfffffffffffffc00; 0x3ff))
In particular, notice how: 1. In the first call to __reg_deduce_bounds, __reg32_deduce_bounds learns new u32 bounds. 2. __reg64_deduce_bounds is unable to improve bounds at this point. 3. __reg_deduce_mixed_bounds derives new u64 bounds from the u32 bounds. 4. In the second call to __reg_deduce_bounds, __reg64_deduce_bounds improves the smax and umin bounds thanks to patch "bpf: Improve bounds when s64 crosses sign boundary" from this series. 5. Subsequent functions are unable to improve the ranges further (only tnums). Yet, a better smin32 bound could be learned from the smin bound.
__reg32_deduce_bounds is able to improve smin32 from smin, but for that we need a third call to __reg_deduce_bounds.
As discussed in [1], there may be a better way to organize the deduction rules to learn the same information with less calls to the same functions. Such an optimization requires further analysis and is orthogonal to the present patchset.
Link: https://lore.kernel.org/bpf/aIKtSK9LjQXB8FLY@mail.gmail.com/ [1] Acked-by: Eduard Zingerman <eddyz87@gmail.com> Co-developed-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Link: https://lore.kernel.org/r/79619d3b42e5525e0e174ed534b75879a5ba15de.1753695655.git.paul.chaignon@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
f96841bb |
| 28-Jul-2025 |
Paul Chaignon <paul.chaignon@gmail.com> |
selftests/bpf: Test invariants on JSLT crossing sign
The improvement of the u64/s64 range refinement fixed the invariant violation that was happening on this test for BPF_JSLT when crossing the sign
selftests/bpf: Test invariants on JSLT crossing sign
The improvement of the u64/s64 range refinement fixed the invariant violation that was happening on this test for BPF_JSLT when crossing the sign boundary.
After this patch, we have one test remaining with a known invariant violation. It's the same test as fixed here but for 32 bits ranges.
Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Link: https://lore.kernel.org/r/ad046fb0016428f1a33c3b81617aabf31b51183f.1753695655.git.paul.chaignon@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
26e5e346 |
| 28-Jul-2025 |
Paul Chaignon <paul.chaignon@gmail.com> |
selftests/bpf: Test cross-sign 64bits range refinement
This patch adds coverage for the new cross-sign 64bits range refinement logic. The three tests cover the cases when the u64 and s64 ranges over
selftests/bpf: Test cross-sign 64bits range refinement
This patch adds coverage for the new cross-sign 64bits range refinement logic. The three tests cover the cases when the u64 and s64 ranges overlap (1) in the negative portion of s64, (2) in the positive portion of s64, and (3) in both portions.
The first test is a simplified version of a BPF program generated by syzkaller that caused an invariant violation [1]. It looks like syzkaller could not extract the reproducer itself (and therefore didn't report it to the mailing list), but I was able to extract it from the console logs of a crash.
The principle is similar to the invariant violation described in commit 6279846b9b25 ("bpf: Forget ranges when refining tnum after JSET"): the verifier walks a dead branch, uses the condition to refine ranges, and ends up with inconsistent ranges. In this case, the dead branch is when we fallthrough on both jumps. The new refinement logic improves the bounds such that the second jump is properly detected as always-taken and the verifier doesn't end up walking a dead branch.
The second and third tests are inspired by the first, but rely on condition jumps to prepare the bounds instead of ALU instructions. An R10 write is used to trigger a verifier error when the bounds can't be refined.
Link: https://syzkaller.appspot.com/bug?extid=c711ce17dd78e5d4fdcf [1] Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Link: https://lore.kernel.org/r/a0e17b00dab8dabcfa6f8384e7e151186efedfdd.1753695655.git.paul.chaignon@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
Revision tags: v6.16, v6.16-rc7, v6.16-rc6 |
|
#
d81526a6 |
| 10-Jul-2025 |
Paul Chaignon <paul.chaignon@gmail.com> |
selftests/bpf: Range analysis test case for JSET
This patch adds coverage for the warning detected by syzkaller and fixed in the previous patch. Without the previous patch, this test fails with:
selftests/bpf: Range analysis test case for JSET
This patch adds coverage for the warning detected by syzkaller and fixed in the previous patch. Without the previous patch, this test fails with:
verifier bug: REG INVARIANTS VIOLATION (false_reg1): range bounds violation u64=[0x0, 0x0] s64=[0x0, 0x0] u32=[0x1, 0x0] s32=[0x0, 0x0] var_off=(0x0, 0x0)(1)
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/c7893be1170fdbcf64e0200c110cdbd360ce7086.1752171365.git.paul.chaignon@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
Revision tags: v6.16-rc5 |
|
#
1230be82 |
| 30-Jun-2025 |
Colin Ian King <colin.i.king@gmail.com> |
selftests/bpf: Fix spelling mistake "subtration" -> "subtraction"
There are spelling mistakes in description text. Fix these.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: A
selftests/bpf: Fix spelling mistake "subtration" -> "subtraction"
There are spelling mistakes in description text. Fix these.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250630125528.563077-1-colin.i.king@gmail.com
show more ...
|
Revision tags: v6.16-rc4 |
|
#
3713b584 |
| 25-Jun-2025 |
Alexei Starovoitov <ast@kernel.org> |
Merge branch 'bpf-verifier-improve-precision-of-bpf_add-and-bpf_sub'
Harishankar Vishwanathan says:
==================== bpf, verifier: Improve precision of BPF_ADD and BPF_SUB
This patchset impro
Merge branch 'bpf-verifier-improve-precision-of-bpf_add-and-bpf_sub'
Harishankar Vishwanathan says:
==================== bpf, verifier: Improve precision of BPF_ADD and BPF_SUB
This patchset improves the precision of BPF_ADD and BPF_SUB range tracking. It also adds selftests that exercise the cases where precision improvement occurs, and selftests for the cases where precise bounds cannot be computed and the output register state values are set to unbounded.
Changelog:
v3: * Improve readability in selftests and commit message by using more readable constants (suggested by Eduard Zingerman). * Add four new selftests for the cases where precise output register state bounds cannot be computed in scalar(32)_min_max_add/sub, so the output register state must be set to unbounded, i.e., [0, U64_MAX] or [0, U32_MAX]. * Add suggested-by Eduard tag to commit message for changes to verifier_bounds.c
v2: * Add clearer example of precision improvement in the commit message for verifier.c changes. * Add selftests that exercise the precision improvement to verifier_bounds.c (suggested by Eduard Zingerman).
v1: https://lore.kernel.org/bpf/20250610221356.2663491-1-harishankar.vishwanathan@gmail.com/ ====================
Link: https://patch.msgid.link/20250623040359.343235-1-harishankar.vishwanathan@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
e1d79454 |
| 23-Jun-2025 |
Harishankar Vishwanathan <harishankar.vishwanathan@gmail.com> |
selftests/bpf: Add testcases for BPF_ADD and BPF_SUB
The previous commit improves the precision in scalar(32)_min_max_add, and scalar(32)_min_max_sub. The improvement in precision occurs in cases wh
selftests/bpf: Add testcases for BPF_ADD and BPF_SUB
The previous commit improves the precision in scalar(32)_min_max_add, and scalar(32)_min_max_sub. The improvement in precision occurs in cases when all outcomes overflow or underflow, respectively.
This commit adds selftests that exercise those cases.
This commit also adds selftests for cases where the output register state bounds for u(32)_min/u(32)_max are conservatively set to unbounded (when there is partial overflow or underflow).
Signed-off-by: Harishankar Vishwanathan <harishankar.vishwanathan@gmail.com> Co-developed-by: Matan Shachnai <m.shachnai@rutgers.edu> Signed-off-by: Matan Shachnai <m.shachnai@rutgers.edu> Suggested-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250623040359.343235-3-harishankar.vishwanathan@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
Revision tags: v6.16-rc3, v6.16-rc2 |
|
#
5fcf896e |
| 10-Jun-2025 |
Alexei Starovoitov <ast@kernel.org> |
Merge branch 'bpf-mitigate-spectre-v1-using-barriers'
Luis Gerhorst says:
==================== This improves the expressiveness of unprivileged BPF by inserting speculation barriers instead of reje
Merge branch 'bpf-mitigate-spectre-v1-using-barriers'
Luis Gerhorst says:
==================== This improves the expressiveness of unprivileged BPF by inserting speculation barriers instead of rejecting the programs.
The approach was previously presented at LPC'24 [1] and RAID'24 [2].
To mitigate the Spectre v1 (PHT) vulnerability, the kernel rejects potentially-dangerous unprivileged BPF programs as of commit 9183671af6db ("bpf: Fix leakage under speculation on mispredicted branches"). In [2], we have analyzed 364 object files from open source projects (Linux Samples and Selftests, BCC, Loxilb, Cilium, libbpf Examples, Parca, and Prevail) and found that this affects 31% to 54% of programs.
To resolve this in the majority of cases this patchset adds a fall-back for mitigating Spectre v1 using speculation barriers. The kernel still optimistically attempts to verify all speculative paths but uses speculation barriers against v1 when unsafe behavior is detected. This allows for more programs to be accepted without disabling the BPF Spectre mitigations (e.g., by setting cpu_mitigations_off()).
For this, it relies on the fact that speculation barriers generally prevent all later instructions from executing if the speculation was not correct (not only loads). See patch 7 ("bpf: Fall back to nospec for Spectre v1") for a detailed description and references to the relevant vendor documentation (AMD and Intel x86-64, ARM64, and PowerPC).
In [1] we have measured the overhead of this approach relative to having mitigations off and including the upstream Spectre v4 mitigations. For event tracing and stack-sampling profilers, we found that mitigations increase BPF program execution time by 0% to 62%. For the Loxilb network load balancer, we have measured a 14% slowdown in SCTP performance but no significant slowdown for TCP. This overhead only applies to programs that were previously rejected.
I reran the expressiveness-evaluation with v6.14 and made sure the main results still match those from [1] and [2] (which used v6.5).
Main design decisions are:
* Do not use separate bytecode insns for v1 and v4 barriers (inspired by Daniel Borkmann's question at LPC). This simplifies the verifier significantly and has the only downside that performance on PowerPC is not as high as it could be.
* Allow archs to still disable v1/v4 mitigations separately by setting bpf_jit_bypass_spec_v1/v4(). This has the benefit that archs can benefit from improved BPF expressiveness / performance if they are not vulnerable (e.g., ARM64 for v4 in the kernel).
* Do not remove the empty BPF_NOSPEC implementation for backends for which it is unknown whether they are vulnerable to Spectre v1.
[1] https://lpc.events/event/18/contributions/1954/ ("Mitigating Spectre-PHT using Speculation Barriers in Linux eBPF") [2] https://arxiv.org/pdf/2405.00078 ("VeriFence: Lightweight and Precise Spectre Defenses for Untrusted Linux Kernel Extensions")
Changes:
* v3 -> v4: - Remove insn parameter from do_check_insn() and extract process_bpf_exit_full as a function as requested by Eduard - Investigate apparent sanitize_check_bounds() bug reported by Kartikeya (does appear to not be a bug but only confusing code), sent separate patch to document it and add an assert - Remove already-merged commit 1 ("selftests/bpf: Fix caps for __xlated/jited_unpriv") - Drop former commit 10 ("bpf: Allow nospec-protected var-offset stack access") as it did not include a test and there are other places where var-off is rejected. Also, none of the tested real-world programs used var-off in the paper. Therefore keep the old behavior for now and potentially prepare a patch that converts all cases later if required. - Add link to AMD lfence and PowerPC speculation barrier (ori 31,31,0) documentation - Move detailed barrier documentation to commit 7 ("bpf: Fall back to nospec for Spectre v1") - Link to v3: https://lore.kernel.org/all/20250501073603.1402960-1-luis.gerhorst@fau.de/
* v2 -> v3: - Fix https://lore.kernel.org/oe-kbuild-all/202504212030.IF1SLhz6-lkp@intel.com/ and similar by moving the bpf_jit_bypass_spec_v1/v4() prototypes out of the #ifdef CONFIG_BPF_SYSCALL. Decided not to move them to filter.h (where similar bpf_jit_*() prototypes live) as they would still have to be duplicated in bpf.h to be usable to bpf_bypass_spec_v1/v4() (unless including filter.h in bpf.h is an option). - Fix https://lore.kernel.org/oe-kbuild-all/202504220035.SoGveGpj-lkp@intel.com/ by moving the variable declarations out of the switch-case. - Build touched C files with W=2 and bpf config on x86 to check that there are no other warnings introduced. - Found 3 more checkpatch warnings that can be fixed without degrading readability. - Rebase to bpf-next 2025-05-01 - Link to v2: https://lore.kernel.org/bpf/20250421091802.3234859-1-luis.gerhorst@fau.de/
* v1 -> v2: - Drop former commits 9 ("bpf: Return PTR_ERR from push_stack()") and 11 ("bpf: Fall back to nospec for spec path verification") as suggested by Alexei. This series therefore no longer changes push_stack() to return PTR_ERR. - Add detailed explanation of how lfence works internally and how it affects the algorithm. - Add tests checking that nospec instructions are inserted in expected locations using __xlated_unpriv as suggested by Eduard (also, include a fix for __xlated_unpriv) - Add a test for the mitigations from the description of commit 9183671af6db ("bpf: Fix leakage under speculation on mispredicted branches") - Remove unused variables from do_check[_insn]() as suggested by Eduard. - Remove INSN_IDX_MODIFIED to improve readability as suggested by Eduard. This also causes the nospec_result-check to run (and fail) for jumping-ops. Add a warning to assert that this check must never succeed in that case. - Add details on the safety of patch 10 ("bpf: Allow nospec-protected var-offset stack access") based on the feedback on v1. - Rebase to bpf-next-250420 - Link to v1: https://lore.kernel.org/all/20250313172127.1098195-1-luis.gerhorst@fau.de/
* RFC -> v1: - rebase to bpf-next-250313 - tests: mark expected successes/new errors - add bpt_jit_bypass_spec_v1/v4() to avoid #ifdef in bpf_bypass_spec_v1/v4() - ensure that nospec with v1-support is implemented for archs for which GCC supports speculation barriers, except for MIPS - arm64: emit speculation barrier - powerpc: change nospec to include v1 barrier - discuss potential security (archs that do not impl. BPF nospec) and performance (only PowerPC) regressions - Link to RFC: https://lore.kernel.org/bpf/20250224203619.594724-1-luis.gerhorst@fau.de/ ====================
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://patch.msgid.link/20250603205800.334980-1-luis.gerhorst@fau.de Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
Revision tags: v6.16-rc1 |
|
#
d6f1c85f |
| 03-Jun-2025 |
Luis Gerhorst <luis.gerhorst@fau.de> |
bpf: Fall back to nospec for Spectre v1
This implements the core of the series and causes the verifier to fall back to mitigating Spectre v1 using speculation barriers. The approach was presented at
bpf: Fall back to nospec for Spectre v1
This implements the core of the series and causes the verifier to fall back to mitigating Spectre v1 using speculation barriers. The approach was presented at LPC'24 [1] and RAID'24 [2].
If we find any forbidden behavior on a speculative path, we insert a nospec (e.g., lfence speculation barrier on x86) before the instruction and stop verifying the path. While verifying a speculative path, we can furthermore stop verification of that path whenever we encounter a nospec instruction.
A minimal example program would look as follows:
A = true B = true if A goto e f() if B goto e unsafe() e: exit
There are the following speculative and non-speculative paths (`cur->speculative` and `speculative` referring to the value of the push_stack() parameters):
- A = true - B = true - if A goto e - A && !cur->speculative && !speculative - exit - !A && !cur->speculative && speculative - f() - if B goto e - B && cur->speculative && !speculative - exit - !B && cur->speculative && speculative - unsafe()
If f() contains any unsafe behavior under Spectre v1 and the unsafe behavior matches `state->speculative && error_recoverable_with_nospec(err)`, do_check() will now add a nospec before f() instead of rejecting the program:
A = true B = true if A goto e nospec f() if B goto e unsafe() e: exit
Alternatively, the algorithm also takes advantage of nospec instructions inserted for other reasons (e.g., Spectre v4). Taking the program above as an example, speculative path exploration can stop before f() if a nospec was inserted there because of Spectre v4 sanitization.
In this example, all instructions after the nospec are dead code (and with the nospec they are also dead code speculatively).
For this, it relies on the fact that speculation barriers generally prevent all later instructions from executing if the speculation was not correct:
* On Intel x86_64, lfence acts as full speculation barrier, not only as a load fence [3]:
An LFENCE instruction or a serializing instruction will ensure that no later instructions execute, even speculatively, until all prior instructions complete locally. [...] Inserting an LFENCE instruction after a bounds check prevents later operations from executing before the bound check completes.
This was experimentally confirmed in [4].
* On AMD x86_64, lfence is dispatch-serializing [5] (requires MSR C001_1029[1] to be set if the MSR is supported, this happens in init_amd()). AMD further specifies "A dispatch serializing instruction forces the processor to retire the serializing instruction and all previous instructions before the next instruction is executed" [8]. As dispatch is not specific to memory loads or branches, lfence therefore also affects all instructions there. Also, if retiring a branch means it's PC change becomes architectural (should be), this means any "wrong" speculation is aborted as required for this series.
* ARM's SB speculation barrier instruction also affects "any instruction that appears later in the program order than the barrier" [6].
* PowerPC's barrier also affects all subsequent instructions [7]:
[...] executing an ori R31,R31,0 instruction ensures that all instructions preceding the ori R31,R31,0 instruction have completed before the ori R31,R31,0 instruction completes, and that no subsequent instructions are initiated, even out-of-order, until after the ori R31,R31,0 instruction completes. The ori R31,R31,0 instruction may complete before storage accesses associated with instructions preceding the ori R31,R31,0 instruction have been performed
Regarding the example, this implies that `if B goto e` will not execute before `if A goto e` completes. Once `if A goto e` completes, the CPU should find that the speculation was wrong and continue with `exit`.
If there is any other path that leads to `if B goto e` (and therefore `unsafe()`) without going through `if A goto e`, then a nospec will still be needed there. However, this patch assumes this other path will be explored separately and therefore be discovered by the verifier even if the exploration discussed here stops at the nospec.
This patch furthermore has the unfortunate consequence that Spectre v1 mitigations now only support architectures which implement BPF_NOSPEC. Before this commit, Spectre v1 mitigations prevented exploits by rejecting the programs on all architectures. Because some JITs do not implement BPF_NOSPEC, this patch therefore may regress unpriv BPF's security to a limited extent:
* The regression is limited to systems vulnerable to Spectre v1, have unprivileged BPF enabled, and do NOT emit insns for BPF_NOSPEC. The latter is not the case for x86 64- and 32-bit, arm64, and powerpc 64-bit and they are therefore not affected by the regression. According to commit a6f6a95f2580 ("LoongArch, bpf: Fix jit to skip speculation barrier opcode"), LoongArch is not vulnerable to Spectre v1 and therefore also not affected by the regression.
* To the best of my knowledge this regression may therefore only affect MIPS. This is deemed acceptable because unpriv BPF is still disabled there by default. As stated in a previous commit, BPF_NOSPEC could be implemented for MIPS based on GCC's speculation_barrier implementation.
* It is unclear which other architectures (besides x86 64- and 32-bit, ARM64, PowerPC 64-bit, LoongArch, and MIPS) supported by the kernel are vulnerable to Spectre v1. Also, it is not clear if barriers are available on these architectures. Implementing BPF_NOSPEC on these architectures therefore is non-trivial. Searching GCC and the kernel for speculation barrier implementations for these architectures yielded no result.
* If any of those regressed systems is also vulnerable to Spectre v4, the system was already vulnerable to Spectre v4 attacks based on unpriv BPF before this patch and the impact is therefore further limited.
As an alternative to regressing security, one could still reject programs if the architecture does not emit BPF_NOSPEC (e.g., by removing the empty BPF_NOSPEC-case from all JITs except for LoongArch where it appears justified). However, this will cause rejections on these archs that are likely unfounded in the vast majority of cases.
In the tests, some are now successful where we previously had a false-positive (i.e., rejection). Change them to reflect where the nospec should be inserted (using __xlated_unpriv) and modify the error message if the nospec is able to mitigate a problem that previously shadowed another problem (in that case __xlated_unpriv does not work, therefore just add a comment).
Define SPEC_V1 to avoid duplicating this ifdef whenever we check for nospec insns using __xlated_unpriv, define it here once. This also improves readability. PowerPC can probably also be added here. However, omit it for now because the BPF CI currently does not include a test.
Limit it to EPERM, EACCES, and EINVAL (and not everything except for EFAULT and ENOMEM) as it already has the desired effect for most real-world programs. Briefly went through all the occurrences of EPERM, EINVAL, and EACCESS in verifier.c to validate that catching them like this makes sense.
Thanks to Dustin for their help in checking the vendor documentation.
[1] https://lpc.events/event/18/contributions/1954/ ("Mitigating Spectre-PHT using Speculation Barriers in Linux eBPF") [2] https://arxiv.org/pdf/2405.00078 ("VeriFence: Lightweight and Precise Spectre Defenses for Untrusted Linux Kernel Extensions") [3] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/runtime-speculative-side-channel-mitigations.html ("Managed Runtime Speculative Execution Side Channel Mitigations") [4] https://dl.acm.org/doi/pdf/10.1145/3359789.3359837 ("Speculator: a tool to analyze speculative execution attacks and mitigations" - Section 4.6 "Stopping Speculative Execution") [5] https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/software-techniques-for-managing-speculation.pdf ("White Paper - SOFTWARE TECHNIQUES FOR MANAGING SPECULATION ON AMD PROCESSORS - REVISION 5.09.23") [6] https://developer.arm.com/documentation/ddi0597/2020-12/Base-Instructions/SB--Speculation-Barrier- ("SB - Speculation Barrier - Arm Armv8-A A32/T32 Instruction Set Architecture (2020-12)") [7] https://wiki.raptorcs.com/w/images/5/5f/OPF_PowerISA_v3.1C.pdf ("Power ISA™ - Version 3.1C - May 26, 2024 - Section 9.2.1 of Book III") [8] https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/40332.pdf ("AMD64 Architecture Programmer’s Manual Volumes 1–5 - Revision 4.08 - April 2024 - 7.6.4 Serializing Instructions")
Signed-off-by: Luis Gerhorst <luis.gerhorst@fau.de> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Henriette Herzog <henriette.herzog@rub.de> Cc: Dustin Nguyen <nguyen@cs.fau.de> Cc: Maximilian Ott <ott@cs.fau.de> Cc: Milan Stephan <milan.stephan@fau.de> Link: https://lore.kernel.org/r/20250603212428.338473-1-luis.gerhorst@fau.de Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2 |
|
#
1260ed77 |
| 08-Apr-2025 |
Thomas Zimmermann <tzimmermann@suse.de> |
Merge drm/drm-fixes into drm-misc-fixes
Backmerging to get updates from v6.15-rc1.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
|
Revision tags: v6.15-rc1 |
|
#
946661e3 |
| 05-Apr-2025 |
Dmitry Torokhov <dmitry.torokhov@gmail.com> |
Merge branch 'next' into for-linus
Prepare input updates for 6.15 merge window.
|
#
b3cc7428 |
| 26-Mar-2025 |
Jiri Kosina <jkosina@suse.com> |
Merge branch 'for-6.15/amd_sfh' into for-linus
From: Mario Limonciello <mario.limonciello@amd.com>
Some platforms include a human presence detection (HPD) sensor. When enabled and a user is detecte
Merge branch 'for-6.15/amd_sfh' into for-linus
From: Mario Limonciello <mario.limonciello@amd.com>
Some platforms include a human presence detection (HPD) sensor. When enabled and a user is detected a wake event will be emitted from the sensor fusion hub that software can react to.
Example use cases are "wake from suspend on approach" or to "lock when leaving".
This is currently enabled by default on supported systems, but users can't control it. This essentially means that wake on approach is enabled which is a really surprising behavior to users that don't expect it.
Instead of defaulting to enabled add a sysfs knob that users can use to enable the feature if desirable and set it to disabled by default.
show more ...
|
Revision tags: v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5 |
|
#
0410c612 |
| 28-Feb-2025 |
Lucas De Marchi <lucas.demarchi@intel.com> |
Merge drm/drm-next into drm-xe-next
Sync to fix conlicts between drm-xe-next and drm-intel-next.
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
#
0b119045 |
| 26-Feb-2025 |
Dmitry Torokhov <dmitry.torokhov@gmail.com> |
Merge tag 'v6.14-rc4' into next
Sync up with the mainline.
|
Revision tags: v6.14-rc4, v6.14-rc3, v6.14-rc2 |
|
#
93c7dd1b |
| 06-Feb-2025 |
Maxime Ripard <mripard@kernel.org> |
Merge drm/drm-next into drm-misc-next
Bring rc1 to start the new release dev.
Signed-off-by: Maxime Ripard <mripard@kernel.org>
|
#
9e676a02 |
| 05-Feb-2025 |
Namhyung Kim <namhyung@kernel.org> |
Merge tag 'v6.14-rc1' into perf-tools-next
To get the various fixes in the current master.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
#
ea9f8f2b |
| 05-Feb-2025 |
Jani Nikula <jani.nikula@intel.com> |
Merge drm/drm-next into drm-intel-next
Sync with v6.14-rc1.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
#
c771600c |
| 05-Feb-2025 |
Tvrtko Ursulin <tursulin@ursulin.net> |
Merge drm/drm-next into drm-intel-gt-next
We need 4ba4f1afb6a9 ("perf: Generic hotplug support for a PMU with a scope") in order to land a i915 PMU simplification and a fix. That landed in 6.12 and
Merge drm/drm-next into drm-intel-gt-next
We need 4ba4f1afb6a9 ("perf: Generic hotplug support for a PMU with a scope") in order to land a i915 PMU simplification and a fix. That landed in 6.12 and we are stuck at 6.9 so lets bump things forward.
Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>
show more ...
|
Revision tags: v6.14-rc1 |
|
#
d0d106a2 |
| 23-Jan-2025 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'bpf-next-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Pull bpf updates from Alexei Starovoitov: "A smaller than usual release cycle.
The main changes are:
-
Merge tag 'bpf-next-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Pull bpf updates from Alexei Starovoitov: "A smaller than usual release cycle.
The main changes are:
- Prepare selftest to run with GCC-BPF backend (Ihor Solodrai)
In addition to LLVM-BPF runs the BPF CI now runs GCC-BPF in compile only mode. Half of the tests are failing, since support for btf_decl_tag is still WIP, but this is a great milestone.
- Convert various samples/bpf to selftests/bpf/test_progs format (Alexis Lothoré and Bastien Curutchet)
- Teach verifier to recognize that array lookup with constant in-range index will always succeed (Daniel Xu)
- Cleanup migrate disable scope in BPF maps (Hou Tao)
- Fix bpf_timer destroy path in PREEMPT_RT (Hou Tao)
- Always use bpf_mem_alloc in bpf_local_storage in PREEMPT_RT (Martin KaFai Lau)
- Refactor verifier lock support (Kumar Kartikeya Dwivedi)
This is a prerequisite for upcoming resilient spin lock.
- Remove excessive 'may_goto +0' instructions in the verifier that LLVM leaves when unrolls the loops (Yonghong Song)
- Remove unhelpful bpf_probe_write_user() warning message (Marco Elver)
- Add fd_array_cnt attribute for prog_load command (Anton Protopopov)
This is a prerequisite for upcoming support for static_branch"
* tag 'bpf-next-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (125 commits) selftests/bpf: Add some tests related to 'may_goto 0' insns bpf: Remove 'may_goto 0' instruction in opt_remove_nops() bpf: Allow 'may_goto 0' instruction in verifier selftests/bpf: Add test case for the freeing of bpf_timer bpf: Cancel the running bpf_timer through kworker for PREEMPT_RT bpf: Free element after unlock in __htab_map_lookup_and_delete_elem() bpf: Bail out early in __htab_map_lookup_and_delete_elem() bpf: Free special fields after unlock in htab_lru_map_delete_node() tools: Sync if_xdp.h uapi tooling header libbpf: Work around kernel inconsistently stripping '.llvm.' suffix bpf: selftests: verifier: Add nullness elision tests bpf: verifier: Support eliding map lookup nullness bpf: verifier: Refactor helper access type tracking bpf: tcp: Mark bpf_load_hdr_opt() arg2 as read-write bpf: verifier: Add missing newline on verbose() call selftests/bpf: Add distilled BTF test about marking BTF_IS_EMBEDDED libbpf: Fix incorrect traversal end type ID when marking BTF_IS_EMBEDDED libbpf: Fix return zero when elf_begin failed selftests/bpf: Fix btf leak on new btf alloc failure in btf_distill test veristat: Load struct_ops programs only once ...
show more ...
|
Revision tags: v6.13, v6.13-rc7, v6.13-rc6 |
|
#
34ea973d |
| 30-Dec-2024 |
Alexei Starovoitov <ast@kernel.org> |
Merge branch 'bpf-verifier-improve-precision-of-bpf_mul'
Matan Shachnai says:
==================== This patch-set aims to improve precision of BPF_MUL and add testcases to illustrate precision gain
Merge branch 'bpf-verifier-improve-precision-of-bpf_mul'
Matan Shachnai says:
==================== This patch-set aims to improve precision of BPF_MUL and add testcases to illustrate precision gains using signed and unsigned bounds.
Changes from v1: - Fixed typo made in patch.
Changes from v2: - Added signed multiplication to BPF_MUL. - Added test cases to exercise BPF_MUL. - Reordered patches in the series.
Changes from v3: - Coding style fixes. ====================
Link: https://patch.msgid.link/20241218032337.12214-1-m.shachnai@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
Revision tags: v6.13-rc5, v6.13-rc4 |
|
#
75137d9e |
| 18-Dec-2024 |
Matan Shachnai <m.shachnai@gmail.com> |
selftests/bpf: Add testcases for BPF_MUL
The previous commit improves precision of BPF_MUL. Add tests to exercise updated BPF_MUL.
Signed-off-by: Matan Shachnai <m.shachnai@gmail.com> Link: https:/
selftests/bpf: Add testcases for BPF_MUL
The previous commit improves precision of BPF_MUL. Add tests to exercise updated BPF_MUL.
Signed-off-by: Matan Shachnai <m.shachnai@gmail.com> Link: https://lore.kernel.org/r/20241218032337.12214-3-m.shachnai@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
Revision tags: v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1 |
|
#
36ec807b |
| 20-Sep-2024 |
Dmitry Torokhov <dmitry.torokhov@gmail.com> |
Merge branch 'next' into for-linus
Prepare input updates for 6.12 merge window.
|