| #
28599a1e
|
| 26-Feb-2026 |
Konstantin Belousov <kib@FreeBSD.org> |
sys: add renameat2(2) syscall
Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D55539
|
| #
4d707825
|
| 08-Jan-2026 |
Konstantin Belousov <kib@FreeBSD.org> |
Add pdwait(2)
Reviewed by: asomers, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D54592
|
| #
5c2ee618
|
| 08-Jan-2026 |
Konstantin Belousov <kib@FreeBSD.org> |
sys: add pdrfork(2)
Reviewed by: asomers, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D54592
|
| #
e02c57ff
|
| 26-Oct-2025 |
Justin Hibbits <jhibbits@FreeBSD.org> |
kern: Introduce kexec system feature (MI)
Introduce a new system call and reboot method to support booting a new kernel directly from FreeBSD.
Linux has included a system call, kexec_load(), since
kern: Introduce kexec system feature (MI)
Introduce a new system call and reboot method to support booting a new kernel directly from FreeBSD.
Linux has included a system call, kexec_load(), since 2005, which permits booting a new kernel at reboot instead of requiring a full reboot cycle through the BIOS/firmware. This change brings that same system call to FreeBSD. Other changesets will add the MD components for some of our architectures, with stubs for the rest until the MD components have been written.
kexec_load() supports loading up to an arbitrary limit of 16 memory segments. These segments must be contained inside memory bounded in vm_phys_segs (vm.phys_segs sysctl), and a segment must be contained within a single vm.phys_segs segment, cannot cross adjacent segments.
Reviewed by: imp, kib Sponsored by: Juniper Networks, Inc. Differential Revision: https://reviews.freebsd.org/D51619
show more ...
|
| #
696cfb27
|
| 12-Sep-2025 |
Olivier Certner <olce@FreeBSD.org> |
syscalls: Old setgroups(2)/getgroups(2): Remove superfluous STD type
An examination of the scripts under 'sys/tools/syscalls' indicates that keeping STD as a type in the presence of COMPATxx does no
syscalls: Old setgroups(2)/getgroups(2): Remove superfluous STD type
An examination of the scripts under 'sys/tools/syscalls' indicates that keeping STD as a type in the presence of COMPATxx does not make any difference, and regenerating system call files with STD removed does indeed not show any difference. Moreover, this practice is inconsistent with the rest of the file.
Thus, remove the superfluous STD type for the two above-mentioned system calls. While here, re-order the remaining types for getgroups() to be consistent with other such occurences (COMPATxx before CAPENABLED).
Reviewed by: kevans, emaste MFC after: 5 days MFC to: stable/15 Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D52499
show more ...
|
| #
851dc7f8
|
| 04-Sep-2025 |
Jamie Gritton <jamie@FreeBSD.org> |
jail: add jail descriptors
Similar to process descriptors, jail desriptors are allow jail administration using the file descriptor interface instead of JIDs. They come from and can be used by jail_s
jail: add jail descriptors
Similar to process descriptors, jail desriptors are allow jail administration using the file descriptor interface instead of JIDs. They come from and can be used by jail_set(2) and jail_get(2), and there are two new system calls, jail_attach_jd(2) and jail_remove_jd(2).
Reviewed by: bz, brooks Relnotes: yes Differential Revision: https://reviews.freebsd.org/D43696
show more ...
|
| #
9da2fe96
|
| 15-Aug-2025 |
Kyle Evans <kevans@FreeBSD.org> |
kern: fix setgroups(2) and getgroups(2) to match other platforms
On most other platforms observed, including OpenBSD, NetBSD, and Linux, these system calls have long since been converted to only tou
kern: fix setgroups(2) and getgroups(2) to match other platforms
On most other platforms observed, including OpenBSD, NetBSD, and Linux, these system calls have long since been converted to only touching the supplementary groups of the process. This poses both portability and security concerns in porting software to and from FreeBSD, as this subtle difference is a landmine waiting to happen. Bugs have been discovered even in FreeBSD-local sources, since this behavior is somewhat unintuitive (see, e.g., fix 48fd05999b0f for chroot(8)).
Now that the egid is tracked outside of cr_groups in our ucred, convert the syscalls to deal with only supplementary groups. Some remaining stragglers in base that had baked in assumptions about these syscalls are fixed in the process to avoid heartburn in conversion.
For relnotes: application developers should audit their use of both setgroups(2) and getgroups(2) for signs that they had assumed the previous FreeBSD behavior of using the first element for the egid. Any calls to setgroups() to clear groups that used a single array of the now or soon-to-be egid can be converted to setgroups(0, NULL) calls to clear the supplementary groups entirely on all FreeBSD versions.
Co-authored-by: olce (but bugs are likely mine) Relnotes: yes (see last paragraph) Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D51648
show more ...
|
| #
2a255687
|
| 08-Aug-2025 |
Brooks Davis <brooks@FreeBSD.org> |
syscalls.master: mark _exit as not returning
Reviewed by: imp, kib Differential Revision: https://reviews.freebsd.org/D51674
|
| #
202ac097
|
| 08-Aug-2025 |
Brooks Davis <brooks@FreeBSD.org> |
sysent: add a new NORETURN type flag
System calls of type NORETURN don't return and their stubs are declare not to.
Reviewed by: kevans, kib Differential Revision: https://reviews.freebsd.org/D51673
|
| #
e7e964cb
|
| 08-Aug-2025 |
Brooks Davis <brooks@FreeBSD.org> |
syscalls: normalize _exit(2) declerations
exit(3) is implemented by the runtime and performs a number of shutdown actions before ultimately calling _exit(2) to terminate the program. We historicall
syscalls: normalize _exit(2) declerations
exit(3) is implemented by the runtime and performs a number of shutdown actions before ultimately calling _exit(2) to terminate the program. We historically named the syscall table entry `exit` rather than `_exit`, but this requires special handling in libc/libsys to cause the `_exit` symbol to exist while implementing `exit` in libc.
Declare the syscall as `_exit` and flow that through the system.
Because syscall(SYS_exit, code) is fairly widely used, allow a configured extra line in syscall.h to define SYS_exit to SYS__exit.
I've found no external uses of __sys_exit() so I've not bothered to create a compatability version of this private symbol.
Reviewed by: imp, kib, emaste Differential Revision: https://reviews.freebsd.org/D51672
show more ...
|
| #
406fffde
|
| 01-Aug-2025 |
Brooks Davis <brooks@FreeBSD.org> |
syscalls: make __sysctl's first argument const
This matches the sysctl(3) prototype.
Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D51669
|
| #
21a0a2c0
|
| 06-Jul-2025 |
Konstantin Belousov <kib@FreeBSD.org> |
exterrctl(2): mark as CAPENABLED
Sponsored by: The FreeBSD Foundation
|
| #
f1f23043
|
| 03-Jul-2025 |
Mark Johnston <markj@FreeBSD.org> |
vfs: Initial revision of inotify
Add an implementation of inotify_init(), inotify_add_watch(), inotify_rm_watch(), source-compatible with Linux. This provides functionality similar to kevent(2)'s E
vfs: Initial revision of inotify
Add an implementation of inotify_init(), inotify_add_watch(), inotify_rm_watch(), source-compatible with Linux. This provides functionality similar to kevent(2)'s EVFILT_VNODE, i.e., it lets applications monitor filesystem files for accesses. Compared to inotify, however, EVFILT_VNODE has the limitation of requiring the application to open the file to be monitored. This means that activity on a newly created file cannot be monitored reliably, and that a file descriptor per file in the hierarchy is required.
inotify on the other hand allows a directory and its entries to be monitored at once. It introduces a new file descriptor type to which "watches" can be attached; a watch is a pseudo-file descriptor associated with a file or directory and a set of events to watch for. When a watched vnode is accessed, a description of the event is queued to the inotify descriptor, readable with read(2). Events for files in a watched directory include the file name.
A watched vnode has its usecount bumped, so name cache entries originating from a watched directory are not evicted. Name cache entries are used to populate inotify events for files with a link in a watched directory. In particular, if a file is accessed with, say, read(2), an IN_ACCESS event will be generated for any watched hard link of the file.
The inotify_add_watch_at() variant is included so that this functionality is available in capability mode; plain inotify_add_watch() is disallowed in capability mode.
When a file in a nullfs mount is watched, the watch is attached to the lower vnode, such that accesses via either layer generate inotify events.
Many thanks to Gleb Popov for testing this patch and finding lots of bugs.
PR: 258010, 215011 Reviewed by: kib Tested by: arrowd MFC after: 3 months Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D50315
show more ...
|
| #
09dfe066
|
| 23-May-2025 |
Konstantin Belousov <kib@FreeBSD.org> |
kernel: copyout extended errors to userspace and add exterrctl(2) to control it
Reviewed by: brooks Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.fre
kernel: copyout extended errors to userspace and add exterrctl(2) to control it
Reviewed by: brooks Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D50483
show more ...
|
| #
f29905ca
|
| 18-Feb-2025 |
Brooks Davis <brooks@FreeBSD.org> |
makesyscalls: deprecate cpp other than includes
Warn that C preprocessor directives in the config file are deprecated. They are unsound and support has a number of potential pitfalls. They should b
makesyscalls: deprecate cpp other than includes
Warn that C preprocessor directives in the config file are deprecated. They are unsound and support has a number of potential pitfalls. They should be replaced by compile-time generation of files plus an overlay framework to allow things like per-arch variation.
Reviewed by: kevans Sponsored by: DARPA, AFRL Pull Request: https://github.com/freebsd/freebsd-src/pull/1575
show more ...
|
| #
42d075f4
|
| 18-Feb-2025 |
Brooks Davis <brooks@FreeBSD.org> |
makesyscalls: Restore support for cpp in input
Allow patterns like this in syscalls.master:
#if 0 91 AUE_NULL RESERVED #else 91 AUE_NULL STD|CAPENABLED { int newsyscall(void); } #endif
m
makesyscalls: Restore support for cpp in input
Allow patterns like this in syscalls.master:
#if 0 91 AUE_NULL RESERVED #else 91 AUE_NULL STD|CAPENABLED { int newsyscall(void); } #endif
makesyscalls.lua and it's predecessor makesyscalls.sh (really an awk script with a tiny shell prolog) used a single pass parsing model where lines beginning with `#` were emitted into most generated files as they were read. I belive this was initially there to allow includes to be listed in syscalls.master, but Hyrum's Law[0] applies and people are using it for things like architecture-specific syscall definitions.
This use of CPP macro is unsound and there are a number of sharp edges in both the new and old implementations. The macros are unsound because not all the files were generate are run through CPP (or if they are not in the same context) and this will increasingly be true as we generate more things. Sharp edges include the fact that anything before the first syscall would be printed at a different scope (e.g., before an array is declared).
In this patch I collect each non-#include CPP directive and attach them to the syscall table or individual entries. All entries before the first syscall and after the last are attached to the prolog and epilog members. Within the syscall table all entries are attached to the next system calls's prolog member. In generators, each prolog entry is printed regardless of the system call's visibiilty which replicates the naive single pass model's behavior (including lots of empty blocks of #if/#else/#endif in the output). Unlike makesyscalls.lua, I discard none #define entries at the top of the file and print a warning as their usefulness appears limited.
[0] https://www.hyrumslaw.com
Reported by: kevans Reviewed by: kevans Sponsored by: DARPA, AFRL Pull Request: https://github.com/freebsd/freebsd-src/pull/1575
show more ...
|
| #
765ad4f0
|
| 01-Feb-2025 |
Gleb Smirnoff <glebius@FreeBSD.org> |
rpcsec_tls: cleanup the rpctls_syscall()
With all the recent changes we don't need extra argument that specifies what exactly the syscalls does, neither we need a copyout-able pointer, just a pointe
rpcsec_tls: cleanup the rpctls_syscall()
With all the recent changes we don't need extra argument that specifies what exactly the syscalls does, neither we need a copyout-able pointer, just a pointer sized integer.
Reviewed by: rmacklem Differential Revision: https://reviews.freebsd.org/D48649
show more ...
|
| #
030c0282
|
| 01-Feb-2025 |
Gleb Smirnoff <glebius@FreeBSD.org> |
kgssapi: remove the gssd_syscall
Reviewed by: brooks Differential Revision: https://reviews.freebsd.org/D48554
|
| #
ddb3eb4e
|
| 18-Jul-2024 |
Olivier Certner <olce@FreeBSD.org> |
New setcred() system call and associated MAC hooks
This new system call allows to set all necessary credentials of a process in one go: Effective, real and saved UIDs, effective, real and saved GIDs
New setcred() system call and associated MAC hooks
This new system call allows to set all necessary credentials of a process in one go: Effective, real and saved UIDs, effective, real and saved GIDs, supplementary groups and the MAC label. Its advantage over standard credential-setting system calls (such as setuid(), seteuid(), etc.) is that it enables MAC modules, such as MAC/do, to restrict the set of credentials some process may gain in a fine-grained manner.
Traditionally, credential changes rely on setuid binaries that call multiple credential system calls and in a specific order (setuid() must be last, so as to remain root for all other credential-setting calls, which would otherwise fail with insufficient privileges). This piecewise approach causes the process to transiently hold credentials that are neither the original nor the final ones. For the kernel to enforce that only certain transitions of credentials are allowed, either these possibly non-compliant transient states have to disappear (by setting all relevant attributes in one go), or the kernel must delay setting or checking the new credentials. Delaying setting credentials could be done, e.g., by having some mode where the standard system calls contribute to building new credentials but without committing them. It could be started and ended by a special system call. Delaying checking could mean that, e.g., the kernel only verifies the credentials transition at the next non-credential-setting system call (we just mention this possibility for completeness, but are certainly not endorsing it).
We chose the simpler approach of a new system call, as we don't expect the set of credentials one can set to change often. It has the advantages that the traditional system calls' code doesn't have to be changed and that we can establish a special MAC protocol for it, by having some cleanup function called just before returning (this is a requirement for MAC/do), without disturbing the existing ones.
The mac_cred_check_setcred() hook is passed the flags received by setcred() (including the version) and both the old and new kernel's 'struct ucred' instead of 'struct setcred' as this should simplify evolving existing hooks as the 'struct setcred' structure evolves. The mac_cred_setcred_enter() and mac_cred_setcred_exit() hooks are always called by pairs around potential calls to mac_cred_check_setcred(). They allow MAC modules to allocate/free data they may need in their mac_cred_check_setcred() hook, as the latter is called under the current process' lock, rendering sleepable allocations impossible. MAC/do is going to leverage these in a subsequent commit. A scheme where mac_cred_check_setcred() could return ERESTART was considered but is incompatible with proper composition of MAC modules.
While here, add missing includes and declarations for standalone inclusion of <sys/ucred.h> both from kernel and userspace (for the latter, it has been working thanks to <bsm/audit.h> already including <sys/types.h>).
Reviewed by: brooks Approved by: markj (mentor) Relnotes: yes Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D47618
show more ...
|
| #
b165e9e3
|
| 29-Nov-2024 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Add fchroot(2)
This is similar to chroot(2), but takes a file descriptor instead of path. Same syscall exists in NetBSD and Solaris. It is part of a larger patch to make absolute pathnames usable
Add fchroot(2)
This is similar to chroot(2), but takes a file descriptor instead of path. Same syscall exists in NetBSD and Solaris. It is part of a larger patch to make absolute pathnames usable in Capsicum mode, but should be useful in other contexts too.
Reviewed By: brooks Sponsored by: Innovate UK Differential Revision: https://reviews.freebsd.org/D41564
show more ...
|
| #
bbc0f33b
|
| 30-Oct-2024 |
Brooks Davis <brooks@FreeBSD.org> |
sysent: add a NOLIB modifer to prevent stub generation
The yield system call has long existed, but never had a stub. Replace the hardcoded checks for it in libsys_h.lua and syscalls_map.lua and sto
sysent: add a NOLIB modifer to prevent stub generation
The yield system call has long existed, but never had a stub. Replace the hardcoded checks for it in libsys_h.lua and syscalls_map.lua and stop inserting it into MIASM (requiring libsys/Makefile.sys to disable the stub).
(This seems like overkill, but I've got another case in CheriBSD so this reduces my diff appreciably.)
Reviewed by: emaste Pull Request: https://github.com/freebsd/freebsd-src/pull/1503
show more ...
|
| #
913bfd86
|
| 22-Oct-2024 |
Brooks Davis <brooks@FreeBSD.org> |
Update mentions of makesyscalls.lua
It is obsolete and will be removed in a followup commit.
|
| #
f028f44e
|
| 20-Sep-2024 |
Konstantin Belousov <kib@FreeBSD.org> |
Add getrlimitusage(2)
Reviewed by: markj, olce Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D46747
|
| #
d0675399
|
| 27-Aug-2024 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
capsicum: allow subset of wait4(2) functionality
The usual way of handling process exit exit in capsicum(4) mode is by using process descriptors (pdfork(2)) instead of the traditional fork(2)/wait4(
capsicum: allow subset of wait4(2) functionality
The usual way of handling process exit exit in capsicum(4) mode is by using process descriptors (pdfork(2)) instead of the traditional fork(2)/wait4(2) API. But most apps hadn't been converted this way, and many cannot because the wait is hidden behind a library APIs that revolve around PID numbers and not descriptors; GLib's g_spawn_check_wait_status(3) is one example.
Thus, provide backwards compatibility by allowing the wait(2) family of functions in Capsicum mode, except for child processes created by pdfork(2).
Reviewed by: brooks, oshogbo Sponsored by: Innovate UK Differential Revision: https://reviews.freebsd.org/D44372
show more ...
|
| #
6b7e4254
|
| 21-May-2024 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
capsicum: allow rfork(2) in capability mode
Reviewed by: brooks, rwatson MFC after: 4 days Differential Revision: https://reviews.freebsd.org/D45040
|