| #
f1e8b1ac
|
| 06-Mar-2026 |
Andrew Gallatin <gallatin@FreeBSD.org> |
splice: optionally limit worker queues
Add a new tunable/sysctl (kern.ipc.splice.num_wq) which can be used to limit the number of splice worker queues as a way to limit splice cpu use.
The default
splice: optionally limit worker queues
Add a new tunable/sysctl (kern.ipc.splice.num_wq) which can be used to limit the number of splice worker queues as a way to limit splice cpu use.
The default (-1) keeps the current behavior of running one worker for each core in the system. An administrator can set it to 0 (either via tunable, or before the first splice call via sysctl) to effectively disable splice, or some number smaller than the number of cores to limit splice thread use.
Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D55579 Sponsored by: Netflix
show more ...
|
| #
454212b9
|
| 25-Feb-2026 |
Michael Tuexen <tuexen@FreeBSD.org> |
sctp: fix so_proto when peeling off a socket
Reported by: glebius Reviewed by: rrs Fixes: d195b3783fa4 ("sctp: fix socket type created by sctp_peeloff()") Differential Revision: https://reviews.
sctp: fix so_proto when peeling off a socket
Reported by: glebius Reviewed by: rrs Fixes: d195b3783fa4 ("sctp: fix socket type created by sctp_peeloff()") Differential Revision: https://reviews.freebsd.org/D55454
show more ...
|
| #
f5923578
|
| 06-Feb-2026 |
Gleb Smirnoff <glebius@FreeBSD.org> |
sockets: repair sctp_peeloff(2)
The shim function soattach() may be passed a non-listening socket by SCTP.
NB: the change makes soattach() more hairy, but long term plan is that this function goes
sockets: repair sctp_peeloff(2)
The shim function soattach() may be passed a non-listening socket by SCTP.
NB: the change makes soattach() more hairy, but long term plan is that this function goes away.
PR: 293010 Fixes: 64f7e3c9c178ab35cb1f8fdf791aec74ede6f6b2
show more ...
|
| #
64f7e3c9
|
| 03-Feb-2026 |
Gleb Smirnoff <glebius@FreeBSD.org> |
sockets: let protocols be responsible for socket buffer mutexes
Sockets that implement their own socket buffers (marked with PR_SOCKBUF) are now also responsible for initialization of socket buffer
sockets: let protocols be responsible for socket buffer mutexes
Sockets that implement their own socket buffers (marked with PR_SOCKBUF) are now also responsible for initialization of socket buffer mutexes in pr_attach and for destruction in pr_detach (or pr_close).
This removes a big bunch of reported LORs, as now WITNESS is able to see that tcp(4) socket buffer mutex and netlink(4) socket buffer mutex are two different things. Distinct names also improve diagnostics for blocked threads.
This also removes a hack from unix(4), where we used to mtx_destroy(). Also removes an innocent bug from unix(4) where for accept(2)-ed socket soreserve() was called twice. This one was innocent since first call to soreserve() was asking for 0 bytes of space.
This slightly increased amount of pasted code in TCP's syncache_socket(). The problem is that while for sockets created with socket(2) it is pr_attach responsible for call to soreserve() (including !PR_SOCKBUF protocols), but for the sockets created with accept(2) it was solisten_clone() doing soreserve(), combined with the fact that for accept(2) TCP completely bypasses pr_attach. This all should improve once TCP has its own socket buffers.
Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D54984
show more ...
|
| #
d195b378
|
| 31-Jan-2026 |
Michael Tuexen <tuexen@FreeBSD.org> |
sctp: fix socket type created by sctp_peeloff()
When calling sctp_peeloff() on a SOCK_SEQPACKET socket, the created and returned socket has the type SOCK_STREAM. This is specified in section 9.2 of
sctp: fix socket type created by sctp_peeloff()
When calling sctp_peeloff() on a SOCK_SEQPACKET socket, the created and returned socket has the type SOCK_STREAM. This is specified in section 9.2 of RFC 6458.
Reported by: Xin Long MFC after: 3 days
show more ...
|
| #
a0d60795
|
| 14-Dec-2025 |
Mark Johnston <markj@FreeBSD.org> |
socket: Fix the name of a parameter in a comment
Reported by: des Fixes: 0a68f644dca1 ("socket: Split up soreceive_generic()") MFC after: 1 week
|
| #
e967a2a0
|
| 11-Dec-2025 |
Gleb Smirnoff <glebius@FreeBSD.org> |
sockets: remove compat shim for divert(4)
All known software in ports had been addressed three years ago and the shim stays in stable/14 and stable/15 for another couple years with its printf(), so
sockets: remove compat shim for divert(4)
All known software in ports had been addressed three years ago and the shim stays in stable/14 and stable/15 for another couple years with its printf(), so all ourliers are expected to conform before 16.0-RELEASE. See 8624f4347e8133911b0554e816f6bedb56dc5fb3 for details.
show more ...
|
| #
a837d1fe
|
| 09-Dec-2025 |
Andrew Gallatin <gallatin@FreeBSD.org> |
splice: Fix leaks that can happen when initiating a splice
- change the state to SPLICE_EXCEPTION to allow so_unsplice() to work to cleanup failed splices (fixes socket reference leak) - NULL out
splice: Fix leaks that can happen when initiating a splice
- change the state to SPLICE_EXCEPTION to allow so_unsplice() to work to cleanup failed splices (fixes socket reference leak) - NULL out sp->dst when unsplicing from so_splice() before so2 has been been referenced. - Deal with a null sp->dst / so2 in so_unsplice - Fix asserts that talked about sp->state == SPLICE_INIT; that state is not possible here.
Differential Revision: https://reviews.freebsd.org/D54157 Reviewed by: markj Sponsored by: Netflix Fixes: c0c5d01e5374 ("so_splice: Synchronize so_unsplice() with so_splice()") MFC after: 3 days
show more ...
|
| #
1390bba4
|
| 16-Nov-2025 |
Mark Johnston <markj@FreeBSD.org> |
file: Add a fdclose method
Consider a program that creates a unix socket pair, transmits both sockets from one to the other using an SCM_RIGHTS message, and then closes both sockets without external
file: Add a fdclose method
Consider a program that creates a unix socket pair, transmits both sockets from one to the other using an SCM_RIGHTS message, and then closes both sockets without externalizing the message. unp_gc() is supposed to handle cleanup, but it is only triggered by uipc_detach(), which runs when a unix socket is destroyed. Because the two sockets are internalized, their refcounts are positive, so uipc_detach() isn't called.
As a result, a userspace program can create an unbounded amount of garbage without triggering reclaim. Let's trigger garbage collection whenever a unix socket is close()d. To implement this, add new a fdclose file op and protocol op, and implement them accordingly. Since mqueuefs has a hack to hook into the file close path, convert it to use the new op as well.
Now, userspace can't create garbage without triggering reclamation.
Reviewed by: glebius, kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D53744
show more ...
|
| #
36138969
|
| 16-Oct-2025 |
Konstantin Belousov <kib@FreeBSD.org> |
knotes: kqueue: handle copy for trivial filters
Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D52045
|
| #
26188470
|
| 25-Jul-2025 |
Mark Johnston <markj@FreeBSD.org> |
socket: Chase a lingering reference to M_NOTAVAIL
Fixes: b93e930ca233 ("sendfile: retire M_BLOCKED")
|
| #
b93e930c
|
| 25-Jul-2025 |
Gleb Smirnoff <glebius@FreeBSD.org> |
sendfile: retire M_BLOCKED
Follow unix(4) commit 51ac5ee0d57f and retire M_BLOCKED for TCP sockets as well. The M_BLOCKED flag was introduced back 2016 together with non- blocking sendfile(2). It
sendfile: retire M_BLOCKED
Follow unix(4) commit 51ac5ee0d57f and retire M_BLOCKED for TCP sockets as well. The M_BLOCKED flag was introduced back 2016 together with non- blocking sendfile(2). It marked mbufs in a sending socket buffer that could be ready to sent, but are sitting behind an M_NOTREADY mbuf(s), that blocks them.
You may consider this flag as an INVARIANT flag that helped to ensure socket buffer consistency. Or maybe the socket code was so convoluted back then, that it was unclear if sbfree() may be called on an mbuf that is in the middle of the buffer, and I decided to introduce the flag to protect against that. With today state of socket buffer code it became clear that the latter cannot happen. And this commit adds an assertion proving that.
Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D50728
show more ...
|
| #
f2c2ed7d
|
| 25-Jul-2025 |
Gleb Smirnoff <glebius@FreeBSD.org> |
sendfile: don't hack sb_lowat for sockets that manage the watermark
In the sendfile(2) we carry an old hack (originating from d99b0dd2c5297) to help dumb benchmarks and applications to achieve highe
sendfile: don't hack sb_lowat for sockets that manage the watermark
In the sendfile(2) we carry an old hack (originating from d99b0dd2c5297) to help dumb benchmarks and applications to achieve higher performance. We would modify low watermark on the socket send buffer to avoid socket being reported as writable too early, which would result in lots of small writes.
Skip that hack for applications that do setsockopt(SO_SNDLOWAT) or that register the socket in kevent(2) with NOTE_LOWAT feature. First, we don't want the hack to rewrite the watermark value explicitly specified by the user. Second, in certain cases that can lead to real performance regressions. A kevent(2) with NOTE_LOWAT would report socket as writable, but then sendfile(2) would write 0 bytes and return EAGAIN.
The change also disables the hack for unix(4) sockets, leaving only TCP.
Reviewed by: rrs Differential Revision: https://reviews.freebsd.org/D50581
show more ...
|
| #
f20e8cd5
|
| 28-May-2025 |
Gleb Smirnoff <glebius@FreeBSD.org> |
sockets: remove dom_externalize
It was used only by unix(4) and now is completely isolated.
|
| #
a0da2f73
|
| 08-May-2025 |
Dag-Erling Smørgrav <des@FreeBSD.org> |
Remove remaining mentions of pr_usrreq.
When struct pr_usrreq was folded into struct protosw and the function pointers it contained were renamed from pru_* to pr_* in 2022, a number of references to
Remove remaining mentions of pr_usrreq.
When struct pr_usrreq was folded into struct protosw and the function pointers it contained were renamed from pru_* to pr_* in 2022, a number of references to the old names in comments and error messages were missed. Chase them down and fix them.
Sponsored by: Klara, Inc. Sponsored by: NetApp, Inc. Reviewed by: kevans, glebius Differential Revision: https://reviews.freebsd.org/D50190
show more ...
|
| #
aba6f332
|
| 30-Apr-2025 |
Gleb Smirnoff <glebius@FreeBSD.org> |
sockets: provide protocol method pr_kqfilter
Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D48919
|
| #
1000cc4a
|
| 22-Apr-2025 |
Mark Johnston <markj@FreeBSD.org> |
so_splice: Disallow splicing with KTLS-enabled sockets
Suppose the sink socket in a splice has KTLS enabled. When data is transmitted from the source socket, sosend_generic_locked() receives an mbu
so_splice: Disallow splicing with KTLS-enabled sockets
Suppose the sink socket in a splice has KTLS enabled. When data is transmitted from the source socket, sosend_generic_locked() receives an mbuf rather than a UIO as it would if userspace were transferring data. In this case, ktls_frame() expects the mbuf to be unmapped, but in general this won't be the case.
Simply disallow the combination for now. Modify so_unsplice() to handle dismantling a partially initialized splice, in order to simplify error handling in so_splice(). Make sure that one can't enable KTLS on a spliced socket, or more specifically, that one can't enable RXTLS on the source side of a splice, or TXTLS on the sink side of a splice.
Reported by: syzbot+9cc248c4b0ca9b931ab4@syzkaller.appspotmail.com Reviewed by: gallatin MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D49920
show more ...
|
| #
992b18a9
|
| 15-Apr-2025 |
Mark Johnston <markj@FreeBSD.org> |
so_splice: Synchronize so_unsplice() with so_splice()
so_unsplice() assumed that if SB_SPLICED is set in the receive buffer of the first socket, then the splice is fully initialized. However, that'
so_splice: Synchronize so_unsplice() with so_splice()
so_unsplice() assumed that if SB_SPLICED is set in the receive buffer of the first socket, then the splice is fully initialized. However, that's not true, and it's possible for so_unsplice() to race ahead of so_splice().
Modify so_unsplice() to simply bail if the splice state is embryonic.
Reported by: syzkaller Reviewed by: gallatin Fixes: a1da7dc1cdad ("socket: Implement SO_SPLICE") MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D49814
show more ...
|
| #
590b4503
|
| 29-Mar-2025 |
Mark Johnston <markj@FreeBSD.org> |
socket: Handle the possibility of a protocol with no ctloutput
Add a default ctloutput handler and remove various NULL checks. This fixes a problem wherein the generic SO_SETFIB handler did not che
socket: Handle the possibility of a protocol with no ctloutput
Add a default ctloutput handler and remove various NULL checks. This fixes a problem wherein the generic SO_SETFIB handler did not check whether the protocol has a ctloutput implementation before calling the function pointer.
Reported by: syzkaller Reviewed by: glebius Fixes: caccbaef8e26 ("socket: Move SO_SETFIB handling to protocol layers") MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D49436
show more ...
|
| #
371392bc
|
| 24-Mar-2025 |
Gleb Smirnoff <glebius@FreeBSD.org> |
sockbuf: remove sbflush_internal() and sbrelease_internal() shims
This functions serve just one purpose - allow to call sbdestroy() from sofree() without triggering unlocked mutex assertions. Let's
sockbuf: remove sbflush_internal() and sbrelease_internal() shims
This functions serve just one purpose - allow to call sbdestroy() from sofree() without triggering unlocked mutex assertions. Let's just don't save on locking with INVARIANTS kernel and this will allow to clean up all these shims. Should be no functional changes.
Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D49363
show more ...
|
| #
57481635
|
| 23-Mar-2025 |
Mark Johnston <markj@FreeBSD.org> |
socket: Fix a race in the SO_SPLICE state machine
When so_splice() links two sockets together, it first attaches the splice control structure to the source socket; at that point, the splice is in th
socket: Fix a race in the SO_SPLICE state machine
When so_splice() links two sockets together, it first attaches the splice control structure to the source socket; at that point, the splice is in the idle state. After that point, a socket wakeup will queue up work for a splice worker thread: in particular, so_splice_dispatch() only queues work if the splice is idle.
Meanwhile, so_splice() continues initializing the splice, and finally calls so_splice_xfer() to transfer any already buffered data. This assumes that the splice is still idle, but that's not true if some async work was already dispatched.
Solve the problem by introducing an initial "under construction" state for the splice control structure, such that wakeups won't queue any work until so_splice() has finished.
While here, remove an outdated comment from the beginning of so_splice_xfer().
Reported by: syzkaller Reviewed by: gallatin Fixes: a1da7dc1cdad ("socket: Implement SO_SPLICE") MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D49437
show more ...
|
| #
ee951eb5
|
| 06-Feb-2025 |
Mark Johnston <markj@FreeBSD.org> |
socket: Add an option to retrieve a socket's FIB number
The SO_SETFIB option can be used to set a socket's FIB number, but there is no way to retrieve it. Rename SO_SETFIB to SO_FIB and implement a
socket: Add an option to retrieve a socket's FIB number
The SO_SETFIB option can be used to set a socket's FIB number, but there is no way to retrieve it. Rename SO_SETFIB to SO_FIB and implement a handler for it for getsockopt(2).
Reviewed by: glebius MFC after: 2 weeks Sponsored by: Klara, Inc. Sponsored by: Stormshield Differential Revision: https://reviews.freebsd.org/D48834
show more ...
|
| #
caccbaef
|
| 06-Feb-2025 |
Mark Johnston <markj@FreeBSD.org> |
socket: Move SO_SETFIB handling to protocol layers
In particular, we store a FIB number in both struct socket and in struct inpcb. When updating the FIB number with setsockopt(SO_SETFIB), make the
socket: Move SO_SETFIB handling to protocol layers
In particular, we store a FIB number in both struct socket and in struct inpcb. When updating the FIB number with setsockopt(SO_SETFIB), make the update atomic. This is required to support the new bind_all_fibs mode, since in that mode changing the FIB of a bound socket is not permitted.
This requires a bit more code, but avoids a layering violation in sosetopt(), where we hard-code the list of protocol families that implement SO_SETFIB.
Reviewed by: glebius MFC after: 2 weeks Sponsored by: Klara, Inc. Sponsored by: Stormshield Differential Revision: https://reviews.freebsd.org/D48666
show more ...
|
| #
b0580c7a
|
| 03-Feb-2025 |
Gleb Smirnoff <glebius@FreeBSD.org> |
sockets: remove empty shim function sopoll()
|
| #
815f2a61
|
| 03-Feb-2025 |
Gleb Smirnoff <glebius@FreeBSD.org> |
sockets: removed unused argument from sopoll()
|