| #
7b71f57f
|
| 03-Dec-2025 |
Warner Losh <imp@FreeBSD.org> |
netinet: Remove left-over sys/cdefs.h
These were for $FreeBSD$ that was removed a while ago, but these includes didn't get swept up in that. Remove them all now.
Sponsored by: Netflix MFC After:
netinet: Remove left-over sys/cdefs.h
These were for $FreeBSD$ that was removed a while ago, but these includes didn't get swept up in that. Remove them all now.
Sponsored by: Netflix MFC After: 2 weeks
show more ...
|
| #
dd0e6bb9
|
| 22-Nov-2025 |
Andrew Gallatin <gallatin@FreeBSD.org> |
tcp: Enable symmetric hashing by setting hash on outgoing conns
Now that we can trust NICs to supply an identical hash result to software, we can setup the inpcb hash on outgoing connections. This g
tcp: Enable symmetric hashing by setting hash on outgoing conns
Now that we can trust NICs to supply an identical hash result to software, we can setup the inpcb hash on outgoing connections. This gives us symmetric hashing, meaning packets should enter and leave on the same NIC queue.
Differential Revision: https://reviews.freebsd.org/D53104 Reviewed by: adrian, cc, kbowling, tuexen, zlei Sponsored by: Netflix
show more ...
|
| #
8e8956f7
|
| 02-Nov-2025 |
Michael Tuexen <tuexen@FreeBSD.org> |
ddb: use %b when showing flags for a tcpcb
This is much more compact. Thanks to markj@ for suggesting the change.
Reviewed by: markj, Peter Lei, imp, Nick Banks MFC after: 3 days Sponsored by: N
ddb: use %b when showing flags for a tcpcb
This is much more compact. Thanks to markj@ for suggesting the change.
Reviewed by: markj, Peter Lei, imp, Nick Banks MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D53510
show more ...
|
| #
9aa5a79e
|
| 31-Oct-2025 |
Michael Tuexen <tuexen@FreeBSD.org> |
ddb: optionally print inp when printing tcpcb
Add /i option to the ddb commands show tcpcb and show all tcpcbs, which enables the printing of the t_inpcb.
Reviewed by: markj MFC after: 3 days Spo
ddb: optionally print inp when printing tcpcb
Add /i option to the ddb commands show tcpcb and show all tcpcbs, which enables the printing of the t_inpcb.
Reviewed by: markj MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D53497
show more ...
|
| #
f2c2ed7d
|
| 25-Jul-2025 |
Gleb Smirnoff <glebius@FreeBSD.org> |
sendfile: don't hack sb_lowat for sockets that manage the watermark
In the sendfile(2) we carry an old hack (originating from d99b0dd2c5297) to help dumb benchmarks and applications to achieve highe
sendfile: don't hack sb_lowat for sockets that manage the watermark
In the sendfile(2) we carry an old hack (originating from d99b0dd2c5297) to help dumb benchmarks and applications to achieve higher performance. We would modify low watermark on the socket send buffer to avoid socket being reported as writable too early, which would result in lots of small writes.
Skip that hack for applications that do setsockopt(SO_SNDLOWAT) or that register the socket in kevent(2) with NOTE_LOWAT feature. First, we don't want the hack to rewrite the watermark value explicitly specified by the user. Second, in certain cases that can lead to real performance regressions. A kevent(2) with NOTE_LOWAT would report socket as writable, but then sendfile(2) would write 0 bytes and return EAGAIN.
The change also disables the hack for unix(4) sockets, leaving only TCP.
Reviewed by: rrs Differential Revision: https://reviews.freebsd.org/D50581
show more ...
|
| #
15c991fd
|
| 24-Jul-2025 |
Nick Banks <nickbanks@netflix.com> |
tcp: remove trailing whitespaces
Reviewed by: cc, tuexen, Peter Lei Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D51437
|
| #
96f544bc
|
| 07-Jul-2025 |
Michael Tuexen <tuexen@FreeBSD.org> |
tcp: don't allow to connect a TCP/IPv6 endpoint in TIME WAIT state
This ensures the TCP/IPv4 and TCP/IPv6 behave the same.
Reported by: syzbot+4de353ba85dac4dcb1ab@syzkaller.appspotmail.com Review
tcp: don't allow to connect a TCP/IPv6 endpoint in TIME WAIT state
This ensures the TCP/IPv4 and TCP/IPv6 behave the same.
Reported by: syzbot+4de353ba85dac4dcb1ab@syzkaller.appspotmail.com Reviewed by: Peter Lei MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D51125
show more ...
|
| #
ba3d5479
|
| 17-Jun-2025 |
Mark Johnston <markj@FreeBSD.org> |
tcp: Fix the SO_REUSEPORT_LB check
This needs to happen in tcp_connect() rather than tcp_usr_connect(), as the latter is reachable by implied connect() via sendto().
Reviewed by: glebius Reported b
tcp: Fix the SO_REUSEPORT_LB check
This needs to happen in tcp_connect() rather than tcp_usr_connect(), as the latter is reachable by implied connect() via sendto().
Reviewed by: glebius Reported by: syzbot+eecc86e6952fd9ba9f11@syzkaller.appspotmail.com Fixes: c7f803c71dae ("inpcb: fix a panic with SO_REUSEPORT_LB + connect(2) misuse") MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D50893
show more ...
|
| #
0dc78204
|
| 10-Jun-2025 |
Michael Tuexen <tuexen@FreeBSD.org> |
ddb: fix handling of BBLog entries when BBLog is disabled
Fixes: a62c6b0de48a ("ddb: add optional printing of BBLog entries") MFC after: 1 week Sponsored by: Netflix, Inc.
|
| #
a62c6b0d
|
| 10-Jun-2025 |
Michael Tuexen <tuexen@FreeBSD.org> |
ddb: add optional printing of BBLog entries
Add a /b option to show tcpcb and show all tcpcbs to print BBLog entries. Right now this supports the entries generated by the FreeBSD default TCP stack.
ddb: add optional printing of BBLog entries
Add a /b option to show tcpcb and show all tcpcbs to print BBLog entries. Right now this supports the entries generated by the FreeBSD default TCP stack. It should help in debugging issues reported by syzkaller. The syntax for printing sent and received packets is similar to the one used by packetdrill, since the output of ddb will be used to create packetdrill scripts for debugging.
Reviewed by: thj Tested by: thj MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D50629
show more ...
|
| #
f1430567
|
| 28-May-2025 |
Michael Tuexen <tuexen@FreeBSD.org> |
ddb: add show all tcpcbs
Add a command to show all TCP control blocks. Also provide an option to limit the output to TCP control blocks, which are locked. The plan is to run show all tcpcbs/l when s
ddb: add show all tcpcbs
Add a command to show all TCP control blocks. Also provide an option to limit the output to TCP control blocks, which are locked. The plan is to run show all tcpcbs/l when syzkaller triggers a panic. If a TCP control block is affected, it is most likely locked and therefore the command shows the information of the affected TCP control block.
Reviewed by: markj, thj Tested by: thj MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D50516
show more ...
|
| #
8d4f495d
|
| 28-May-2025 |
Michael Tuexen <tuexen@FreeBSD.org> |
ddb: improve show tcpcb
Print the name of the TCP function block and the name of the congestion control algorithm. Furthermore, print some information related to Black Box Logging.
Reviewed by: th
ddb: improve show tcpcb
Print the name of the TCP function block and the name of the congestion control algorithm. Furthermore, print some information related to Black Box Logging.
Reviewed by: thj MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D50535
show more ...
|
| #
a0da2f73
|
| 08-May-2025 |
Dag-Erling Smørgrav <des@FreeBSD.org> |
Remove remaining mentions of pr_usrreq.
When struct pr_usrreq was folded into struct protosw and the function pointers it contained were renamed from pru_* to pr_* in 2022, a number of references to
Remove remaining mentions of pr_usrreq.
When struct pr_usrreq was folded into struct protosw and the function pointers it contained were renamed from pru_* to pr_* in 2022, a number of references to the old names in comments and error messages were missed. Chase them down and fix them.
Sponsored by: Klara, Inc. Sponsored by: NetApp, Inc. Reviewed by: kevans, glebius Differential Revision: https://reviews.freebsd.org/D50190
show more ...
|
| #
a35f24c9
|
| 30-Apr-2025 |
Gleb Smirnoff <glebius@FreeBSD.org> |
sendfile: factor out socket send buffer space sensing into a method
Move a block of code that works with the socket send buffer from the main sendfile loop into a separate function. Make it a proto
sendfile: factor out socket send buffer space sensing into a method
Move a block of code that works with the socket send buffer from the main sendfile loop into a separate function. Make it a protocol method, so that protocols may provide a different one.
While here, provide a long comment explaining why we modify sb_lowat and why we can't just remove that hack.
No functional change intended.
Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D48918
show more ...
|
| #
6e764890
|
| 31-Mar-2025 |
Michael Tuexen <tuexen@FreeBSD.org> |
tcp: remove support for TCPPCAP
This feature could be used to store the last sent and received TCP packets for a TCP endpoint. There was no utility to get these packets from a live system or core. T
tcp: remove support for TCPPCAP
This feature could be used to store the last sent and received TCP packets for a TCP endpoint. There was no utility to get these packets from a live system or core. This functionality is now provided by TCP Black Box Logging, which also stores additional events. There are tools to get these traces from a live system or a core. Therefore remove TCPPCAP to avoid maintaining it, when it is not used anymore.
Reviewed by: rrs, rscheff, Peter Lei, glebiu Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D49589
show more ...
|
| #
c7f803c7
|
| 07-Mar-2025 |
Gleb Smirnoff <glebius@FreeBSD.org> |
inpcb: fix a panic with SO_REUSEPORT_LB + connect(2) misuse
This combination doesn't make any sense. This socket option makes sense only on a socket that is going to be a listening one. There are
inpcb: fix a panic with SO_REUSEPORT_LB + connect(2) misuse
This combination doesn't make any sense. This socket option makes sense only on a socket that is going to be a listening one. There are two options here: refuse connect(2) on a socket that has the option set previously, or ignore (and clear) the option. After some discussion on phabricator, we have chosen the former, for safety and consistency reasons. Any programmer that runs this sequence is doing something wrong and should be informed of that with appropriate error code.
Since connect(2) is a SUS API that has a defined set of error codes, none of which corresponds to "a socket has non-standard incompatible socket option set", we decided to return the same error that an already listening socket would return.
Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D49150
show more ...
|
| #
e92a78ad
|
| 07-Mar-2025 |
Gleb Smirnoff <glebius@FreeBSD.org> |
tcp: return EOPNOTSUPP on attempt to connect(2) a listening socket
This is the error code specified by SUS. Only the TCP over IPv6 required this fix.
Fixes: bd4a39cc93d9faf8b5c000855d5aa90df592d
tcp: return EOPNOTSUPP on attempt to connect(2) a listening socket
This is the error code specified by SUS. Only the TCP over IPv6 required this fix.
Fixes: bd4a39cc93d9faf8b5c000855d5aa90df592dd49 Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D49275
show more ...
|
| #
5dc99e9b
|
| 06-Feb-2025 |
Mark Johnston <markj@FreeBSD.org> |
tcp: Add a sysctl to modify listening socket FIB inheritance
Introduce the net.inet.tcp.bind_all_fibs tunable, set to 1 by default for compatibility with current behaviour. When set to 0, all TCP l
tcp: Add a sysctl to modify listening socket FIB inheritance
Introduce the net.inet.tcp.bind_all_fibs tunable, set to 1 by default for compatibility with current behaviour. When set to 0, all TCP listening sockets are private to their FIB. Inbound connection requests will only succeed if a matching inpcb is bound to the same FIB as the request.
No functional change intended, as the new behaviour is not enabled by default.
Reviewed by: glebius MFC after: 2 weeks Sponsored by: Klara, Inc. Sponsored by: Stormshield Differential Revision: https://reviews.freebsd.org/D48663
show more ...
|
| #
bbd0084b
|
| 06-Feb-2025 |
Mark Johnston <markj@FreeBSD.org> |
inpcb: Add a flags parameter to in_pcbbind()
Add a flag, INPBIND_FIB, which means that the inpcb is local to its FIB number. When this flag is specified, duplicate bindings are permitted, so long a
inpcb: Add a flags parameter to in_pcbbind()
Add a flag, INPBIND_FIB, which means that the inpcb is local to its FIB number. When this flag is specified, duplicate bindings are permitted, so long as each FIB contains at most one inpcb bound to the same address/port. If an inpcb is bound with this flag, it'll have the INP_BOUNDFIB flag set.
No functional change intended.
Reviewed by: glebius MFC after: 2 weeks Sponsored by: Klara, Inc. Sponsored by: Stormshield Differential Revision: https://reviews.freebsd.org/D48661
show more ...
|
| #
06bf119f
|
| 28-Jan-2025 |
Gleb Smirnoff <glebius@FreeBSD.org> |
sockets/tcp: quick fix for regression with SO_REUSEPORT_LB
There was a long living problem that pr_listen is called every time on consecutive listen(2) syscalls. Up until today it produces spurious
sockets/tcp: quick fix for regression with SO_REUSEPORT_LB
There was a long living problem that pr_listen is called every time on consecutive listen(2) syscalls. Up until today it produces spurious TCP state change events in tracing software and other harmless problems. But with 7cbb6b6e28db we started to call LIST_REMOVE() twice on the same entry.
This is quite ugly, but quick and robust fix against regression, that we decided to put in the scope of the January stabilization week. A better refactoring will happen later.
Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D48703 Fixes: 7cbb6b6e28db33095a1cf7a8887921a5ec969824
show more ...
|
| #
7cbb6b6e
|
| 23-Jan-2025 |
Mark Johnston <markj@FreeBSD.org> |
inpcb: Close some SO_REUSEPORT_LB races, part 2
Suppose a thread is adds a socket to an existing TCP lbgroup that is actively accepting connections. It has to do the following operations: 1. set SO
inpcb: Close some SO_REUSEPORT_LB races, part 2
Suppose a thread is adds a socket to an existing TCP lbgroup that is actively accepting connections. It has to do the following operations: 1. set SO_REUSEPORT_LB on the socket 2. bind() the socket to the shared address/port 3. call listen()
Step 2 makes the inpcb visible to incoming connection requests. However, at this point the inpcb cannot accept new connections. If in_pcblookup() matches it, the remote end will see ECONNREFUSED even when other listening sockets are present in the lbgroup. This means that dynamically adding inpcbs to an lbgroup (e.g., by starting up new workers) can trigger spurious connection failures for no good reason. (A similar problem exists when removing inpcbs from an lbgroup, but that is harder to fix and is not addressed by this patch; see the review for a bit more commentary.)
Fix this by augmenting each lbgroup with a linked list of inpcbs that are pending a listen() call. When adding an inpcb to an lbgroup, keep the inpcb on this list if listen() hasn't been called, so it is not yet visible to the lookup path. Then, add a new in_pcblisten() routine which makes the inpcb visible within the lbgroup now that it's safe to let it handle new connections.
Add a regression test which verifies that we don't get spurious connection errors while adding sockets to an LB group.
Reviewed by: glebius MFC after: 1 month Sponsored by: Klara, Inc. Sponsored by: Stormshield Differential Revision: https://reviews.freebsd.org/D48544
show more ...
|
| #
053a9884
|
| 23-Dec-2024 |
Gleb Smirnoff <glebius@FreeBSD.org> |
tcp: don't ever return ECONNRESET on close(2)
The SUS doesn't mention this error code as a possible one [1]. The FreeBSD manual page specifies a possible ECONNRESET for close(2):
[ECONNRESET] The u
tcp: don't ever return ECONNRESET on close(2)
The SUS doesn't mention this error code as a possible one [1]. The FreeBSD manual page specifies a possible ECONNRESET for close(2):
[ECONNRESET] The underlying object was a stream socket that was shut down by the peer before all pending data was delivered.
In the past it had been EINVAL (see 21367f630d72), and this EINVAL was added as a safety measure in 623dce13c64ef. After conversion to ECONNRESET it had been documented in the manual page in 78e3a7fdd51e6, but I bet wasn't ever tested to actually be ever returned, cause the tcp-testsuite[2] didn't exist back then. So documentation is incorrect since 2006, if my bet wins. Anyway, in the modern FreeBSD the condition described above doesn't end up with ECONNRESET error code from close(2). The error condition is reported via SO_ERROR socket option, though. This can be checked using the tcp-testsuite, temporarily disabling the getsockopt(SO_ERROR) lines using sed command [3]. Most of these getsockopt(2)s are followed by '+0.00 close(3) = 0', which will confirm that close(2) doesn't return ECONNRESET even on a socket that has the error stored, neither it is returned in the case described in the manual page. The latter case is covered by multiple tests residing in tcp- testsuite/state-event-engine/rcv-rst-*.
However, the deleted block of code could be entered in a race condition between close(2) and processing of incoming packet, when connection had already been half-closed with shutdown(SHUT_WR) and sits in TCPS_LAST_ACK. This was reported in the bug 146845. With the block deleted, we will continue into tcp_disconnect() which has proper handling of INP_DROPPED.
The race explanation follows. The connection is in TCPS_LAST_ACK. The network input thread acquires the tcpcb lock first, sets INP_DROPPED, acquires the socket lock in soisdisconnected() and clears SS_ISCONNECTED. Meanwhile, the syscall thread goes through sodisconnect() which checks for SS_ISCONNECTED locklessly(!). The check passes and the thread blocks on the tcpcb lock in tcp_usr_disconnect(). Once input thread releases the lock, the syscall thread observes INP_DROPPED and returns ECONNRESET.
- Thread 1: tcp_do_segment()->tcp_close()->in_pcbdrop(),soisdisconnected() - Thread 2: sys_close()...->soclose()->sodisconnect()->tcp_usr_disconnect()
Note that the lockless operation in sodisconnect() isn't correct, but enforcing the socket lock there will not fix the problem.
[1] https://pubs.opengroup.org/onlinepubs/9799919799/ [2] https://github.com/freebsd-net/tcp-testsuite [3] sed -i "" -Ee '/\+0\.00 getsockopt\(3, SOL_SOCKET, SO_ERROR, \[ECONNRESET\]/d' $(grep -lr ECONNRESET tcp-testsuite)
PR: 146845 Reviewed by: tuexen, rrs, imp Differential Revision: https://reviews.freebsd.org/D48148
show more ...
|
| #
c91dd7a0
|
| 19-Dec-2024 |
Gleb Smirnoff <glebius@FreeBSD.org> |
tcp: remove unused variable from tcp_usr_disconnect()
|
| #
0b4539ee
|
| 14-Nov-2024 |
Gleb Smirnoff <glebius@FreeBSD.org> |
inpcb: gc unused argument of in_pcbconnect()
|
| #
dded4e9e
|
| 13-Nov-2024 |
Richard Scheffenegger <rscheff@FreeBSD.org> |
tcp: change SOCKBUF_* macros to SOCK_[RECV|SEND]BUF_* macros
Change the older LOCK related macros over to the dedicated send/recv buffer macros in the base tcp stack.
No functional change intended.
tcp: change SOCKBUF_* macros to SOCK_[RECV|SEND]BUF_* macros
Change the older LOCK related macros over to the dedicated send/recv buffer macros in the base tcp stack.
No functional change intended.
Reviewed By: tuexen, #transport Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D47567
show more ...
|