Revision tags: v6.3, v6.3-rc7 |
|
#
c2865b11 |
| 13-Apr-2023 |
Jakub Kicinski <kuba@kernel.org> |
Daniel Borkmann says:
==================== pull-request: bpf-next 2023-04-13
We've added 260 non-merge commits during the last 36 day(s) which contain a total of 356 files changed, 21786 insertions
Daniel Borkmann says:
==================== pull-request: bpf-next 2023-04-13
We've added 260 non-merge commits during the last 36 day(s) which contain a total of 356 files changed, 21786 insertions(+), 11275 deletions(-).
The main changes are:
1) Rework BPF verifier log behavior and implement it as a rotating log by default with the option to retain old-style fixed log behavior, from Andrii Nakryiko.
2) Adds support for using {FOU,GUE} encap with an ipip device operating in collect_md mode and add a set of BPF kfuncs for controlling encap params, from Christian Ehrig.
3) Allow BPF programs to detect at load time whether a particular kfunc exists or not, and also add support for this in light skeleton, from Alexei Starovoitov.
4) Optimize hashmap lookups when key size is multiple of 4, from Anton Protopopov.
5) Enable RCU semantics for task BPF kptrs and allow referenced kptr tasks to be stored in BPF maps, from David Vernet.
6) Add support for stashing local BPF kptr into a map value via bpf_kptr_xchg(). This is useful e.g. for rbtree node creation for new cgroups, from Dave Marchevsky.
7) Fix BTF handling of is_int_ptr to skip modifiers to work around tracing issues where a program cannot be attached, from Feng Zhou.
8) Migrate a big portion of test_verifier unit tests over to test_progs -a verifier_* via inline asm to ease {read,debug}ability, from Eduard Zingerman.
9) Several updates to the instruction-set.rst documentation which is subject to future IETF standardization (https://lwn.net/Articles/926882/), from Dave Thaler.
10) Fix BPF verifier in the __reg_bound_offset's 64->32 tnum sub-register known bits information propagation, from Daniel Borkmann.
11) Add skb bitfield compaction work related to BPF with the overall goal to make more of the sk_buff bits optional, from Jakub Kicinski.
12) BPF selftest cleanups for build id extraction which stand on its own from the upcoming integration work of build id into struct file object, from Jiri Olsa.
13) Add fixes and optimizations for xsk descriptor validation and several selftest improvements for xsk sockets, from Kal Conley.
14) Add BPF links for struct_ops and enable switching implementations of BPF TCP cong-ctls under a given name by replacing backing struct_ops map, from Kui-Feng Lee.
15) Remove a misleading BPF verifier env->bypass_spec_v1 check on variable offset stack read as earlier Spectre checks cover this, from Luis Gerhorst.
16) Fix issues in copy_from_user_nofault() for BPF and other tracers to resemble copy_from_user_nmi() from safety PoV, from Florian Lehner and Alexei Starovoitov.
17) Add --json-summary option to test_progs in order for CI tooling to ease parsing of test results, from Manu Bretelle.
18) Batch of improvements and refactoring to prep for upcoming bpf_local_storage conversion to bpf_mem_cache_{alloc,free} allocator, from Martin KaFai Lau.
19) Improve bpftool's visual program dump which produces the control flow graph in a DOT format by adding C source inline annotations, from Quentin Monnet.
20) Fix attaching fentry/fexit/fmod_ret/lsm to modules by extracting the module name from BTF of the target and searching kallsyms of the correct module, from Viktor Malik.
21) Improve BPF verifier handling of '<const> <cond> <non_const>' to better detect whether in particular jmp32 branches are taken, from Yonghong Song.
22) Allow BPF TCP cong-ctls to write app_limited of struct tcp_sock. A built-in cc or one from a kernel module is already able to write to app_limited, from Yixin Shen.
Conflicts:
Documentation/bpf/bpf_devel_QA.rst b7abcd9c656b ("bpf, doc: Link to submitting-patches.rst for general patch submission info") 0f10f647f455 ("bpf, docs: Use internal linking for link to netdev subsystem doc") https://lore.kernel.org/all/20230307095812.236eb1be@canb.auug.org.au/
include/net/ip_tunnels.h bc9d003dc48c3 ("ip_tunnel: Preserve pointer const in ip_tunnel_info_opts") ac931d4cdec3d ("ipip,ip_tunnel,sit: Add FOU support for externally controlled ipip devices") https://lore.kernel.org/all/20230413161235.4093777-1-broonie@kernel.org/
net/bpf/test_run.c e5995bc7e2ba ("bpf, test_run: fix crashes due to XDP frame overwriting/corruption") 294635a8165a ("bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES") https://lore.kernel.org/all/20230320102619.05b80a98@canb.auug.org.au/ ====================
Link: https://lore.kernel.org/r/20230413191525.7295-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
Revision tags: v6.3-rc6 |
|
#
e8f59d84 |
| 04-Apr-2023 |
Andrii Nakryiko <andrii@kernel.org> |
Merge branch 'bpf: Follow up to RCU enforcement in the verifier.'
Alexei Starovoitov says:
====================
From: Alexei Starovoitov <ast@kernel.org>
The patch set is addressing a fallout fro
Merge branch 'bpf: Follow up to RCU enforcement in the verifier.'
Alexei Starovoitov says:
====================
From: Alexei Starovoitov <ast@kernel.org>
The patch set is addressing a fallout from commit 6fcd486b3a0a ("bpf: Refactor RCU enforcement in the verifier.") It was too aggressive with PTR_UNTRUSTED marks. Patches 1-6 are cleanup and adding verifier smartness to address real use cases in bpf programs that broke with too aggressive PTR_UNTRUSTED. The partial revert is done in patch 7 anyway. ====================
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
show more ...
|
#
91571a51 |
| 04-Apr-2023 |
Alexei Starovoitov <ast@kernel.org> |
bpf: Teach verifier that certain helpers accept NULL pointer.
bpf_[sk|inode|task|cgrp]_storage_[get|delete]() and bpf_get_socket_cookie() helpers perform run-time check that sk|inode|task|cgrp point
bpf: Teach verifier that certain helpers accept NULL pointer.
bpf_[sk|inode|task|cgrp]_storage_[get|delete]() and bpf_get_socket_cookie() helpers perform run-time check that sk|inode|task|cgrp pointer != NULL. Teach verifier about this fact and allow bpf programs to pass PTR_TO_BTF_ID | PTR_MAYBE_NULL into such helpers. It will be used in the subsequent patch that will do bpf_sk_storage_get(.., skb->sk, ...); Even when 'skb' pointer is trusted the 'sk' pointer may be NULL.
Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/bpf/20230404045029.82870-5-alexei.starovoitov@gmail.com
show more ...
|
Revision tags: v6.3-rc5, v6.3-rc4 |
|
#
8d275960 |
| 26-Mar-2023 |
Alexei Starovoitov <ast@kernel.org> |
Merge branch 'bpf: Use bpf_mem_cache_alloc/free in bpf_local_storage'
Martin KaFai Lau says:
====================
From: Martin KaFai Lau <martin.lau@kernel.org>
This set is a continuation of the
Merge branch 'bpf: Use bpf_mem_cache_alloc/free in bpf_local_storage'
Martin KaFai Lau says:
====================
From: Martin KaFai Lau <martin.lau@kernel.org>
This set is a continuation of the effort in using bpf_mem_cache_alloc/free in bpf_local_storage [1]
Major change is only using bpf_mem_alloc for task and cgrp storage while sk and inode stay with kzalloc/kfree. The details is in patch 2.
[1]: https://lore.kernel.org/bpf/20230308065936.1550103-1-martin.lau@linux.dev/
v3: - Only use bpf_mem_alloc for task and cgrp storage. - sk and inode storage stay with kzalloc/kfree. - Check NULL and add comments in bpf_mem_cache_raw_free() in patch 1. - Added test and benchmark for task storage.
v2: - Added bpf_mem_cache_alloc_flags() and bpf_mem_cache_raw_free() to hide the internal data structure of the bpf allocator. - Fixed a typo bug in bpf_selem_free() - Simplified the test_local_storage test by directly using err returned from libbpf ====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
08a7ce38 |
| 22-Mar-2023 |
Martin KaFai Lau <martin.lau@kernel.org> |
bpf: Use bpf_mem_cache_alloc/free in bpf_local_storage_elem
This patch uses bpf_mem_alloc for the task and cgroup local storage that the bpf prog can easily get a hold of the storage owner's PTR_TO_
bpf: Use bpf_mem_cache_alloc/free in bpf_local_storage_elem
This patch uses bpf_mem_alloc for the task and cgroup local storage that the bpf prog can easily get a hold of the storage owner's PTR_TO_BTF_ID. eg. bpf_get_current_task_btf() can be used in some of the kmalloc code path which will cause deadlock/recursion. bpf_mem_cache_alloc is deadlock free and will solve a legit use case in [1].
For sk storage, its batch creation benchmark shows a few percent regression when the sk create/destroy batch size is larger than 32. The sk creation/destruction happens much more often and depends on external traffic. Considering it is hypothetical to be able to cause deadlock with sk storage, it can cross the bridge to use bpf_mem_alloc till a legit (ie. useful) use case comes up.
For inode storage, bpf_local_storage_destroy() is called before waiting for a rcu gp and its memory cannot be reused immediately. inode stays with kmalloc/kfree after the rcu [or tasks_trace] gp.
A 'bool bpf_ma' argument is added to bpf_local_storage_map_alloc(). Only task and cgroup storage have 'bpf_ma == true' which means to use bpf_mem_cache_alloc/free(). This patch only changes selem to use bpf_mem_alloc for task and cgroup. The next patch will change the local_storage to use bpf_mem_alloc also for task and cgroup.
Here is some more details on the changes:
* memory allocation: After bpf_mem_cache_alloc(), the SDATA(selem)->data is zero-ed because bpf_mem_cache_alloc() could return a reused selem. It is to keep the existing bpf_map_kzalloc() behavior. Only SDATA(selem)->data is zero-ed. SDATA(selem)->data is the visible part to the bpf prog. No need to use zero_map_value() to do the zeroing because bpf_selem_free(..., reuse_now = true) ensures no bpf prog is using the selem before returning the selem through bpf_mem_cache_free(). For the internal fields of selem, they will be initialized when linking to the new smap and the new local_storage.
When 'bpf_ma == false', nothing changes in this patch. It will stay with the bpf_map_kzalloc().
* memory free: The bpf_selem_free() and bpf_selem_free_rcu() are modified to handle the bpf_ma == true case.
For the common selem free path where its owner is also being destroyed, the mem is freed in bpf_local_storage_destroy(), the owner (task and cgroup) has gone through a rcu gp. The memory can be reused immediately, so bpf_local_storage_destroy() will call bpf_selem_free(..., reuse_now = true) which will do bpf_mem_cache_free() for immediate reuse consideration.
An exception is the delete elem code path. The delete elem code path is called from the helper bpf_*_storage_delete() and the syscall bpf_map_delete_elem(). This path is an unusual case for local storage because the common use case is to have the local storage staying with its owner life time so that the bpf prog and the user space does not have to monitor the owner's destruction. For the delete elem path, the selem cannot be reused immediately because there could be bpf prog using it. It will call bpf_selem_free(..., reuse_now = false) and it will wait for a rcu tasks trace gp before freeing the elem. The rcu callback is changed to do bpf_mem_cache_raw_free() instead of kfree().
When 'bpf_ma == false', it should be the same as before. __bpf_selem_free() is added to do the kfree_rcu and call_tasks_trace_rcu(). A few words on the 'reuse_now == true'. When 'reuse_now == true', it is still racing with bpf_local_storage_map_free which is under rcu protection, so it still needs to wait for a rcu gp instead of kfree(). Otherwise, the selem may be reused by slab for a totally different struct while the bpf_local_storage_map_free() is still using it (as a rcu reader). For the inode case, there may be other rcu readers also. In short, when bpf_ma == false and reuse_now == true => vanilla rcu.
[1]: https://lore.kernel.org/bpf/20221118190109.1512674-1-namhyung@kernel.org/
Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20230322215246.1675516-3-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
02adf9e9 |
| 22-Mar-2023 |
Alexei Starovoitov <ast@kernel.org> |
Merge branch 'error checking where helpers call bpf_map_ops'
JP Kobryn says:
====================
Within bpf programs, the bpf helper functions can make inline calls to kernel functions. In this s
Merge branch 'error checking where helpers call bpf_map_ops'
JP Kobryn says:
====================
Within bpf programs, the bpf helper functions can make inline calls to kernel functions. In this scenario there can be a disconnect between the register the kernel function writes a return value to and the register the bpf program uses to evaluate that return value.
As an example, this bpf code:
long err = bpf_map_update_elem(...); if (err && err != -EEXIST) // got some error other than -EEXIST
...can result in the bpf assembly:
; err = bpf_map_update_elem(&mymap, &key, &val, BPF_NOEXIST); 37: movabs $0xffff976a10730400,%rdi 41: mov $0x1,%ecx 46: call 0xffffffffe103291c ; htab_map_update_elem ; if (err && err != -EEXIST) { 4b: cmp $0xffffffffffffffef,%rax ; cmp -EEXIST,%rax 4f: je 0x000000000000008e 51: test %rax,%rax 54: je 0x000000000000008e
The compare operation here evaluates %rax, while in the preceding call to htab_map_update_elem the corresponding assembly returns -EEXIST via %eax (the lower 32 bits of %rax):
movl $0xffffffef, %r9d ... movl %r9d, %eax
...since it's returning int (32-bit). So the resulting comparison becomes:
cmp $0xffffffffffffffef, $0x00000000ffffffef
...making it not possible to check for negative errors or specific errors, since the sign value is left at the 32nd bit. It means in the original example, the conditional branch will be entered even when the error is -EEXIST, which was not intended.
The selftests added cover these cases for the different bpf_map_ops functions. When the second patch is applied, changing the return type of those functions to long, the comparison works as intended and the tests pass. ====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
d7ba4cc9 |
| 22-Mar-2023 |
JP Kobryn <inwardvessel@gmail.com> |
bpf: return long from bpf_map_ops funcs
This patch changes the return types of bpf_map_ops functions to long, where previously int was returned. Using long allows for bpf programs to maintain the si
bpf: return long from bpf_map_ops funcs
This patch changes the return types of bpf_map_ops functions to long, where previously int was returned. Using long allows for bpf programs to maintain the sign bit in the absence of sign extension during situations where inlined bpf helper funcs make calls to the bpf_map_ops funcs and a negative error is returned.
The definitions of the helper funcs are generated from comments in the bpf uapi header at `include/uapi/linux/bpf.h`. The return type of these helpers was previously changed from int to long in commit bdb7b79b4ce8. For any case where one of the map helpers call the bpf_map_ops funcs that are still returning 32-bit int, a compiler might not include sign extension instructions to properly convert the 32-bit negative value a 64-bit negative value.
For example: bpf assembly excerpt of an inlined helper calling a kernel function and checking for a specific error:
; err = bpf_map_update_elem(&mymap, &key, &val, BPF_NOEXIST); ... 46: call 0xffffffffe103291c ; htab_map_update_elem ; if (err && err != -EEXIST) { 4b: cmp $0xffffffffffffffef,%rax ; cmp -EEXIST,%rax
kernel function assembly excerpt of return value from `htab_map_update_elem` returning 32-bit int:
movl $0xffffffef, %r9d ... movl %r9d, %eax
...results in the comparison: cmp $0xffffffffffffffef, $0x00000000ffffffef
Fixes: bdb7b79b4ce8 ("bpf: Switch most helper return values from 32-bit int to 64-bit long") Tested-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: JP Kobryn <inwardvessel@gmail.com> Link: https://lore.kernel.org/r/20230322194754.185781-3-inwardvessel@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
Revision tags: v6.3-rc3, v6.3-rc2 |
|
#
a47eabf2 |
| 08-Mar-2023 |
Martin KaFai Lau <martin.lau@kernel.org> |
bpf: Repurpose use_trace_rcu to reuse_now in bpf_local_storage
This patch re-purpose the use_trace_rcu to mean if the freed memory can be reused immediately or not. The use_trace_rcu is renamed to r
bpf: Repurpose use_trace_rcu to reuse_now in bpf_local_storage
This patch re-purpose the use_trace_rcu to mean if the freed memory can be reused immediately or not. The use_trace_rcu is renamed to reuse_now. Other than the boolean test is reversed, it should be a no-op.
The following explains the reason for the rename and how it will be used in a later patch.
In a later patch, bpf_mem_cache_alloc/free will be used in the bpf_local_storage. The bpf mem allocator will reuse the freed memory immediately. Some of the free paths in bpf_local_storage does not support memory to be reused immediately. These paths are the "delete" elem cases from the bpf_*_storage_delete() helper and the map_delete_elem() syscall. Note that "delete" elem before the owner's (sk/task/cgrp/inode) lifetime ended is not the common usage for the local storage.
The common free path, bpf_local_storage_destroy(), can reuse the memory immediately. This common path means the storage stays with its owner until the owner is destroyed.
The above mentioned "delete" elem paths that cannot reuse immediately always has the 'use_trace_rcu == true'. The cases that is safe for immediate reuse always have 'use_trace_rcu == false'. Instead of adding another arg in a later patch, this patch re-purpose this arg to reuse_now and have the test logic reversed.
In a later patch, 'reuse_now == true' will free to the bpf_mem_cache_free() where the memory can be reused immediately. 'reuse_now == false' will go through the call_rcu_tasks_trace().
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20230308065936.1550103-7-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
2ffcb6fc |
| 08-Mar-2023 |
Martin KaFai Lau <martin.lau@kernel.org> |
bpf: Refactor codes into bpf_local_storage_destroy
This patch first renames bpf_local_storage_unlink_nolock to bpf_local_storage_destroy(). It better reflects that it is only used when the storage's
bpf: Refactor codes into bpf_local_storage_destroy
This patch first renames bpf_local_storage_unlink_nolock to bpf_local_storage_destroy(). It better reflects that it is only used when the storage's owner (sk/task/cgrp/inode) is being kfree().
All bpf_local_storage_destroy's caller is taking the spin lock and then free the storage. This patch also moves these two steps into the bpf_local_storage_destroy.
This is a preparation work for a later patch that uses bpf_mem_cache_alloc/free in the bpf_local_storage.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20230308065936.1550103-3-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
ed69e066 |
| 08-Mar-2023 |
Jakub Kicinski <kuba@kernel.org> |
Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Andrii Nakryiko says:
==================== pull-request: bpf-next 2023-03-08
We've added 23 non-merge commits during the last 2 d
Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Andrii Nakryiko says:
==================== pull-request: bpf-next 2023-03-08
We've added 23 non-merge commits during the last 2 day(s) which contain a total of 28 files changed, 414 insertions(+), 104 deletions(-).
The main changes are:
1) Add more precise memory usage reporting for all BPF map types, from Yafang Shao.
2) Add ARM32 USDT support to libbpf, from Puranjay Mohan.
3) Fix BTF_ID_LIST size causing problems in !CONFIG_DEBUG_INFO_BTF, from Nathan Chancellor.
4) IMA selftests fix, from Roberto Sassu.
5) libbpf fix in APK support code, from Daniel Müller.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (23 commits) selftests/bpf: Fix IMA test libbpf: USDT arm arg parsing support libbpf: Refactor parse_usdt_arg() to re-use code libbpf: Fix theoretical u32 underflow in find_cd() function bpf: enforce all maps having memory usage callback bpf: offload map memory usage bpf, net: xskmap memory usage bpf, net: sock_map memory usage bpf, net: bpf_local_storage memory usage bpf: local_storage memory usage bpf: bpf_struct_ops memory usage bpf: queue_stack_maps memory usage bpf: devmap memory usage bpf: cpumap memory usage bpf: bloom_filter memory usage bpf: ringbuf memory usage bpf: reuseport_array memory usage bpf: stackmap memory usage bpf: arraymap memory usage bpf: hashtab memory usage ... ====================
Link: https://lore.kernel.org/r/20230308193533.1671597-1-andrii@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
a73dc912 |
| 07-Mar-2023 |
Alexei Starovoitov <ast@kernel.org> |
Merge branch 'bpf: bpf memory usage'
Yafang Shao says:
====================
Currently we can't get bpf memory usage reliably either from memcg or from bpftool.
In memcg, there's not a 'bpf' item
Merge branch 'bpf: bpf memory usage'
Yafang Shao says:
====================
Currently we can't get bpf memory usage reliably either from memcg or from bpftool.
In memcg, there's not a 'bpf' item in memory.stat, but only 'kernel', 'sock', 'vmalloc' and 'percpu' which may related to bpf memory. With these items we still can't get the bpf memory usage, because bpf memory usage may far less than the kmem in a memcg, for example, the dentry may consume lots of kmem.
bpftool now shows the bpf memory footprint, which is difference with bpf memory usage. The difference can be quite great in some cases, for example,
- non-preallocated bpf map The non-preallocated bpf map memory usage is dynamically changed. The allocated elements count can be from 0 to the max entries. But the memory footprint in bpftool only shows a fixed number.
- bpf metadata consumes more memory than bpf element In some corner cases, the bpf metadata can consumes a lot more memory than bpf element consumes. For example, it can happen when the element size is quite small.
- some maps don't have key, value or max_entries For example the key_size and value_size of ringbuf is 0, so its memlock is always 0.
We need a way to show the bpf memory usage especially there will be more and more bpf programs running on the production environment and thus the bpf memory usage is not trivial.
This patchset introduces a new map ops ->map_mem_usage to calculate the memory usage. Note that we don't intend to make the memory usage 100% accurate, while our goal is to make sure there is only a small difference between what bpftool reports and the real memory. That small difference can be ignored compared to the total usage. That is enough to monitor the bpf memory usage. For example, the user can rely on this value to monitor the trend of bpf memory usage, compare the difference in bpf memory usage between different bpf program versions, figure out which maps consume large memory, and etc.
This patchset implements the bpf memory usage for all maps, and yet there's still work to do. We don't want to introduce runtime overhead in the element update and delete path, but we have to do it for some non-preallocated maps, - devmap, xskmap When we update or delete an element, it will allocate or free memory. In order to track this dynamic memory, we have to track the count in element update and delete path.
- cpumap The element size of each cpumap element is not determinated. If we want to track the usage, we have to count the size of all elements in the element update and delete path. So I just put it aside currently.
- local_storage, bpf_local_storage When we attach or detach a cgroup, it will allocate or free memory. If we want to track the dynamic memory, we also need to do something in the update and delete path. So I just put it aside currently.
- offload map The element update and delete of offload map is via the netdev dev_ops, in which it may dynamically allocate or free memory, but this dynamic memory isn't counted in offload map memory usage currently.
The result of each map can be found in the individual patch.
We may also need to track per-container bpf memory usage, that will be addressed by a different patchset.
Changes: v3->v4: code improvement on ringbuf (Andrii) use READ_ONCE() to read lpm_trie (Tao) explain why we can't get bpf memory usage from memcg. v2->v3: check callback at map creation time and avoid warning (Alexei) fix build error under CONFIG_BPF=n (lkp@intel.com) v1->v2: calculate the memory usage within bpf (Alexei) - [v1] bpf, mm: bpf memory usage https://lwn.net/Articles/921991/ - [RFC PATCH v2] mm, bpf: Add BPF into /proc/meminfo https://lwn.net/Articles/919848/ - [RFC PATCH v1] mm, bpf: Add BPF into /proc/meminfo https://lwn.net/Articles/917647/ - [RFC PATCH] bpf, mm: Add a new item bpf into memory.stat https://lore.kernel.org/bpf/20220921170002.29557-1-laoar.shao@gmail].com/ ====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
Revision tags: v6.3-rc1 |
|
#
7490b7f1 |
| 05-Mar-2023 |
Yafang Shao <laoar.shao@gmail.com> |
bpf, net: bpf_local_storage memory usage
A new helper is introduced into bpf_local_storage map to calculate the memory usage. This helper is also used by other maps like bpf_cgrp_storage, bpf_inode_
bpf, net: bpf_local_storage memory usage
A new helper is introduced into bpf_local_storage map to calculate the memory usage. This helper is also used by other maps like bpf_cgrp_storage, bpf_inode_storage, bpf_task_storage and etc.
Note that currently the dynamically allocated storage elements are not counted in the usage, since it will take extra runtime overhead in the elements update or delete path. So let's put it aside now, and implement it in the future when someone really need it.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/r/20230305124615.12358-15-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
3e7aeb78 |
| 11-Jan-2024 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'net-next-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni: "The most interesting thing is probably the networking structs
Merge tag 'net-next-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni: "The most interesting thing is probably the networking structs reorganization and a significant amount of changes is around self-tests.
Core & protocols:
- Analyze and reorganize core networking structs (socks, netdev, netns, mibs) to optimize cacheline consumption and set up build time warnings to safeguard against future header changes
This improves TCP performances with many concurrent connections up to 40%
- Add page-pool netlink-based introspection, exposing the memory usage and recycling stats. This helps indentify bad PP users and possible leaks
- Refine TCP/DCCP source port selection to no longer favor even source port at connect() time when IP_LOCAL_PORT_RANGE is set. This lowers the time taken by connect() for hosts having many active connections to the same destination
- Refactor the TCP bind conflict code, shrinking related socket structs
- Refactor TCP SYN-Cookie handling, as a preparation step to allow arbitrary SYN-Cookie processing via eBPF
- Tune optmem_max for 0-copy usage, increasing the default value to 128KB and namespecifying it
- Allow coalescing for cloned skbs coming from page pools, improving RX performances with some common configurations
- Reduce extension header parsing overhead at GRO time
- Add bridge MDB bulk deletion support, allowing user-space to request the deletion of matching entries
- Reorder nftables struct members, to keep data accessed by the datapath first
- Introduce TC block ports tracking and use. This allows supporting multicast-like behavior at the TC layer
- Remove UAPI support for retired TC qdiscs (dsmark, CBQ and ATM) and classifiers (RSVP and tcindex)
- More data-race annotations
- Extend the diag interface to dump TCP bound-only sockets
- Conditional notification of events for TC qdisc class and actions
- Support for WPAN dynamic associations with nearby devices, to form a sub-network using a specific PAN ID
- Implement SMCv2.1 virtual ISM device support
- Add support for Batman-avd mulicast packet type
BPF:
- Tons of verifier improvements: - BPF register bounds logic and range support along with a large test suite - log improvements - complete precision tracking support for register spills - track aligned STACK_ZERO cases as imprecise spilled registers. This improves the verifier "instructions processed" metric from single digit to 50-60% for some programs - support for user's global BPF subprogram arguments with few commonly requested annotations for a better developer experience - support tracking of BPF_JNE which helps cases when the compiler transforms (unsigned) "a > 0" into "if a == 0 goto xxx" and the like - several fixes
- Add initial TX metadata implementation for AF_XDP with support in mlx5 and stmmac drivers. Two types of offloads are supported right now, that is, TX timestamp and TX checksum offload
- Fix kCFI bugs in BPF all forms of indirect calls from BPF into kernel and from kernel into BPF work with CFI enabled. This allows BPF to work with CONFIG_FINEIBT=y
- Change BPF verifier logic to validate global subprograms lazily instead of unconditionally before the main program, so they can be guarded using BPF CO-RE techniques
- Support uid/gid options when mounting bpffs
- Add a new kfunc which acquires the associated cgroup of a task within a specific cgroup v1 hierarchy where the latter is identified by its id
- Extend verifier to allow bpf_refcount_acquire() of a map value field obtained via direct load which is a use-case needed in sched_ext
- Add BPF link_info support for uprobe multi link along with bpftool integration for the latter
- Support for VLAN tag in XDP hints
- Remove deprecated bpfilter kernel leftovers given the project is developed in user-space (https://github.com/facebook/bpfilter)
Misc:
- Support for parellel TC self-tests execution
- Increase MPTCP self-tests coverage
- Updated the bridge documentation, including several so-far undocumented features
- Convert all the net self-tests to run in unique netns, to avoid random failures due to conflict and allow concurrent runs
- Add TCP-AO self-tests
- Add kunit tests for both cfg80211 and mac80211
- Autogenerate Netlink families documentation from YAML spec
- Add yml-gen support for fixed headers and recursive nests, the tool can now generate user-space code for all genetlink families for which we have specs
- A bunch of additional module descriptions fixes
- Catch incorrect freeing of pages belonging to a page pool
Driver API:
- Rust abstractions for network PHY drivers; do not cover yet the full C API, but already allow implementing functional PHY drivers in rust
- Introduce queue and NAPI support in the netdev Netlink interface, allowing complete access to the device <> NAPIs <> queues relationship
- Introduce notifications filtering for devlink to allow control application scale to thousands of instances
- Improve PHY validation, requesting rate matching information for each ethtool link mode supported by both the PHY and host
- Add support for ethtool symmetric-xor RSS hash
- ACPI based Wifi band RFI (WBRF) mitigation feature for the AMD platform
- Expose pin fractional frequency offset value over new DPLL generic netlink attribute
- Convert older drivers to platform remove callback returning void
- Add support for PHY package MMD read/write
New hardware / drivers:
- Ethernet: - Octeon CN10K devices - Broadcom 5760X P7 - Qualcomm SM8550 SoC - Texas Instrument DP83TG720S PHY
- Bluetooth: - IMC Networks Bluetooth radio
Removed:
- WiFi: - libertas 16-bit PCMCIA support - Atmel at76c50x drivers - HostAP ISA/PCMCIA style 802.11b driver - zd1201 802.11b USB dongles - Orinoco ISA/PCMCIA 802.11b driver - Aviator/Raytheon driver - Planet WL3501 driver - RNDIS USB 802.11b driver
Driver updates:
- Ethernet high-speed NICs: - Intel (100G, ice, idpf): - allow one by one port representors creation and removal - add temperature and clock information reporting - add get/set for ethtool's header split ringparam - add again FW logging - adds support switchdev hardware packet mirroring - iavf: implement symmetric-xor RSS hash - igc: add support for concurrent physical and free-running timers - i40e: increase the allowable descriptors - nVidia/Mellanox: - Preparation for Socket-Direct multi-dev netdev. That will allow in future releases combining multiple PFs devices attached to different NUMA nodes under the same netdev - Broadcom (bnxt): - TX completion handling improvements - add basic ntuple filter support - reduce MSIX vectors usage for MQPRIO offload - add VXLAN support, USO offload and TX coalesce completion for P7 - Marvell Octeon EP: - xmit-more support - add PF-VF mailbox support and use it for FW notifications for VFs - Wangxun (ngbe/txgbe): - implement ethtool functions to operate pause param, ring param, coalesce channel number and msglevel - Netronome/Corigine (nfp): - add flow-steering support - support UDP segmentation offload
- Ethernet NICs embedded, slower, virtual: - Xilinx AXI: remove duplicate DMA code adopting the dma engine driver - stmmac: add support for HW-accelerated VLAN stripping - TI AM654x sw: add mqprio, frame preemption & coalescing - gve: add support for non-4k page sizes. - virtio-net: support dynamic coalescing moderation
- nVidia/Mellanox Ethernet datacenter switches: - allow firmware upgrade without a reboot - more flexible support for bridge flooding via the compressed FID flooding mode
- Ethernet embedded switches: - Microchip: - fine-tune flow control and speed configurations in KSZ8xxx - KSZ88X3: enable setting rmii reference - Renesas: - add jumbo frames support - Marvell: - 88E6xxx: add "eth-mac" and "rmon" stats support
- Ethernet PHYs: - aquantia: add firmware load support - at803x: refactor the driver to simplify adding support for more chip variants - NXP C45 TJA11xx: Add MACsec offload support
- Wifi: - MediaTek (mt76): - NVMEM EEPROM improvements - mt7996 Extremely High Throughput (EHT) improvements - mt7996 Wireless Ethernet Dispatcher (WED) support - mt7996 36-bit DMA support - Qualcomm (ath12k): - support for a single MSI vector - WCN7850: support AP mode - Intel (iwlwifi): - new debugfs file fw_dbg_clear - allow concurrent P2P operation on DFS channels
- Bluetooth: - QCA2066: support HFP offload - ISO: more broadcast-related improvements - NXP: better recovery in case receiver/transmitter get out of sync"
* tag 'net-next-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1714 commits) lan78xx: remove redundant statement in lan78xx_get_eee lan743x: remove redundant statement in lan743x_ethtool_get_eee bnxt_en: Fix RCU locking for ntuple filters in bnxt_rx_flow_steer() bnxt_en: Fix RCU locking for ntuple filters in bnxt_srxclsrldel() bnxt_en: Remove unneeded variable in bnxt_hwrm_clear_vnic_filter() tcp: Revert no longer abort SYN_SENT when receiving some ICMP Revert "mlx5 updates 2023-12-20" Revert "net: stmmac: Enable Per DMA Channel interrupt" ipvlan: Remove usage of the deprecated ida_simple_xx() API ipvlan: Fix a typo in a comment net/sched: Remove ipt action tests net: stmmac: Use interrupt mode INTM=1 for per channel irq net: stmmac: Add support for TX/RX channel interrupt net: stmmac: Make MSI interrupt routine generic dt-bindings: net: snps,dwmac: per channel irq net: phy: at803x: make read_status more generic net: phy: at803x: add support for cdt cross short test for qca808x net: phy: at803x: refactor qca808x cable test get status function net: phy: at803x: generalize cdt fault length function net: ethernet: cortina: Drop TSO support ...
show more ...
|
Revision tags: v6.7, v6.7-rc8, v6.7-rc7 |
|
#
c49b292d |
| 19-Dec-2023 |
Jakub Kicinski <kuba@kernel.org> |
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:
==================== pull-request: bpf-next 2023-12-18
This PR is larger than usual
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:
==================== pull-request: bpf-next 2023-12-18
This PR is larger than usual and contains changes in various parts of the kernel.
The main changes are:
1) Fix kCFI bugs in BPF, from Peter Zijlstra.
End result: all forms of indirect calls from BPF into kernel and from kernel into BPF work with CFI enabled. This allows BPF to work with CONFIG_FINEIBT=y.
2) Introduce BPF token object, from Andrii Nakryiko.
It adds an ability to delegate a subset of BPF features from privileged daemon (e.g., systemd) through special mount options for userns-bound BPF FS to a trusted unprivileged application. The design accommodates suggestions from Christian Brauner and Paul Moore.
Example: $ sudo mkdir -p /sys/fs/bpf/token $ sudo mount -t bpf bpffs /sys/fs/bpf/token \ -o delegate_cmds=prog_load:MAP_CREATE \ -o delegate_progs=kprobe \ -o delegate_attachs=xdp
3) Various verifier improvements and fixes, from Andrii Nakryiko, Andrei Matei.
- Complete precision tracking support for register spills - Fix verification of possibly-zero-sized stack accesses - Fix access to uninit stack slots - Track aligned STACK_ZERO cases as imprecise spilled registers. It improves the verifier "instructions processed" metric from single digit to 50-60% for some programs. - Fix verifier retval logic
4) Support for VLAN tag in XDP hints, from Larysa Zaremba.
5) Allocate BPF trampoline via bpf_prog_pack mechanism, from Song Liu.
End result: better memory utilization and lower I$ miss for calls to BPF via BPF trampoline.
6) Fix race between BPF prog accessing inner map and parallel delete, from Hou Tao.
7) Add bpf_xdp_get_xfrm_state() kfunc, from Daniel Xu.
It allows BPF interact with IPSEC infra. The intent is to support software RSS (via XDP) for the upcoming ipsec pcpu work. Experiments on AWS demonstrate single tunnel pcpu ipsec reaching line rate on 100G ENA nics.
8) Expand bpf_cgrp_storage to support cgroup1 non-attach, from Yafang Shao.
9) BPF file verification via fsverity, from Song Liu.
It allows BPF progs get fsverity digest.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (164 commits) bpf: Ensure precise is reset to false in __mark_reg_const_zero() selftests/bpf: Add more uprobe multi fail tests bpf: Fail uprobe multi link with negative offset selftests/bpf: Test the release of map btf s390/bpf: Fix indirect trampoline generation selftests/bpf: Temporarily disable dummy_struct_ops test on s390 x86/cfi,bpf: Fix bpf_exception_cb() signature bpf: Fix dtor CFI cfi: Add CFI_NOSEAL() x86/cfi,bpf: Fix bpf_struct_ops CFI x86/cfi,bpf: Fix bpf_callback_t CFI x86/cfi,bpf: Fix BPF JIT call cfi: Flip headers selftests/bpf: Add test for abnormal cnt during multi-kprobe attachment selftests/bpf: Don't use libbpf_get_error() in kprobe_multi_test selftests/bpf: Add test for abnormal cnt during multi-uprobe attachment bpf: Limit the number of kprobes when attaching program to multiple kprobes bpf: Limit the number of uprobes when attaching program to multiple uprobes bpf: xdp: Register generic_kfunc_set with XDP programs selftests/bpf: utilize string values for delegate_xxx mount options ... ====================
Link: https://lore.kernel.org/r/20231219000520.34178-1-alexei.starovoitov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
Revision tags: v6.7-rc6, v6.7-rc5 |
|
#
09115c33 |
| 09-Dec-2023 |
Martin KaFai Lau <martin.lau@kernel.org> |
Merge branch 'bpf: Expand bpf_cgrp_storage to support cgroup1 non-attach case'
Yafang Shao says:
==================== In the current cgroup1 environment, associating operations between a cgroup and
Merge branch 'bpf: Expand bpf_cgrp_storage to support cgroup1 non-attach case'
Yafang Shao says:
==================== In the current cgroup1 environment, associating operations between a cgroup and applications in a BPF program requires storing a mapping of cgroup_id to application either in a hash map or maintaining it in userspace. However, by enabling bpf_cgrp_storage for cgroup1, it becomes possible to conveniently store application-specific information in cgroup-local storage and utilize it within BPF programs. Furthermore, enabling this feature for cgroup1 involves minor modifications for the non-attach case, streamlining the process.
However, when it comes to enabling this functionality for the cgroup1 attach case, it presents challenges. Therefore, the decision is to focus on enabling it solely for the cgroup1 non-attach case at present. If attempting to attach to a cgroup1 fd, the operation will simply fail with the error code -EBADF.
Changes: - RFC -> v1: - Collect acked-by - Avoid unnecessary is_cgroup1 check (Yonghong) - Keep the code patterns consistent (Yonghong) ====================
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
show more ...
|
#
73d9eb34 |
| 06-Dec-2023 |
Yafang Shao <laoar.shao@gmail.com> |
bpf: Enable bpf_cgrp_storage for cgroup1 non-attach case
In the current cgroup1 environment, associating operations between cgroups and applications in a BPF program requires storing a mapping of cg
bpf: Enable bpf_cgrp_storage for cgroup1 non-attach case
In the current cgroup1 environment, associating operations between cgroups and applications in a BPF program requires storing a mapping of cgroup_id to application either in a hash map or maintaining it in userspace. However, by enabling bpf_cgrp_storage for cgroup1, it becomes possible to conveniently store application-specific information in cgroup-local storage and utilize it within BPF programs. Furthermore, enabling this feature for cgroup1 involves minor modifications for the non-attach case, streamlining the process.
However, when it comes to enabling this functionality for the cgroup1 attach case, it presents challenges. Therefore, the decision is to focus on enabling it solely for the cgroup1 non-attach case at present. If attempting to attach to a cgroup1 fd, the operation will simply fail with the error code -EBADF.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20231206115326.4295-2-laoar.shao@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
show more ...
|
Revision tags: v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1 |
|
#
1ac731c5 |
| 30-Aug-2023 |
Dmitry Torokhov <dmitry.torokhov@gmail.com> |
Merge branch 'next' into for-linus
Prepare input updates for 6.6 merge window.
|
Revision tags: v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3 |
|
#
50501936 |
| 17-Jul-2023 |
Dmitry Torokhov <dmitry.torokhov@gmail.com> |
Merge tag 'v6.4' into next
Sync up with mainline to bring in updates to shared infrastructure.
|
Revision tags: v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7 |
|
#
db6da59c |
| 15-Jun-2023 |
Thomas Zimmermann <tzimmermann@suse.de> |
Merge drm/drm-next into drm-misc-next-fixes
Backmerging to sync drm-misc-next-fixes with drm-misc-next.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
|
#
03c60192 |
| 12-Jun-2023 |
Dmitry Baryshkov <dmitry.baryshkov@linaro.org> |
Merge branch 'drm-next' of git://anongit.freedesktop.org/drm/drm into msm-next-lumag-base
Merge the drm-next tree to pick up the DRM DSC helpers (merged via drm-intel-next tree). MSM DSC v1.2 patche
Merge branch 'drm-next' of git://anongit.freedesktop.org/drm/drm into msm-next-lumag-base
Merge the drm-next tree to pick up the DRM DSC helpers (merged via drm-intel-next tree). MSM DSC v1.2 patches depend on these helpers.
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
show more ...
|
Revision tags: v6.4-rc6 |
|
#
5c680050 |
| 06-Jun-2023 |
Miquel Raynal <miquel.raynal@bootlin.com> |
Merge tag 'v6.4-rc4' into wpan-next/staging
Linux 6.4-rc4
|
#
9ff17e6b |
| 05-Jun-2023 |
Tvrtko Ursulin <tvrtko.ursulin@intel.com> |
Merge drm/drm-next into drm-intel-gt-next
For conflict avoidance we need the following commit:
c9a9f18d3ad8 drm/i915/huc: use const struct bus_type pointers
Signed-off-by: Tvrtko Ursulin <tvrtko
Merge drm/drm-next into drm-intel-gt-next
For conflict avoidance we need the following commit:
c9a9f18d3ad8 drm/i915/huc: use const struct bus_type pointers
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
show more ...
|
Revision tags: v6.4-rc5, v6.4-rc4, v6.4-rc3 |
|
#
9c3a985f |
| 17-May-2023 |
Rodrigo Vivi <rodrigo.vivi@intel.com> |
Merge drm/drm-next into drm-intel-next
Backmerge to get some hwmon dependencies.
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
Revision tags: v6.4-rc2 |
|
#
50282fd5 |
| 12-May-2023 |
Maxime Ripard <maxime@cerno.tech> |
Merge drm/drm-fixes into drm-misc-fixes
Let's bring 6.4-rc1 in drm-misc-fixes to start the new fix cycle.
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
|
#
ff32fcca |
| 09-May-2023 |
Maxime Ripard <maxime@cerno.tech> |
Merge drm/drm-next into drm-misc-next
Start the 6.5 release cycle.
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
|