| #
7b71f57f
|
| 03-Dec-2025 |
Warner Losh <imp@FreeBSD.org> |
netinet: Remove left-over sys/cdefs.h
These were for $FreeBSD$ that was removed a while ago, but these includes didn't get swept up in that. Remove them all now.
Sponsored by: Netflix MFC After:
netinet: Remove left-over sys/cdefs.h
These were for $FreeBSD$ that was removed a while ago, but these includes didn't get swept up in that. Remove them all now.
Sponsored by: Netflix MFC After: 2 weeks
show more ...
|
| #
1a61a673
|
| 24-Oct-2025 |
Peter Lei <peterlei@netflix.com> |
tcp: save progress timeout cause in connection end status
TCP stats are currently incremented for the persist and progress timeout conditions, but only the persist cause was saved in the connection
tcp: save progress timeout cause in connection end status
TCP stats are currently incremented for the persist and progress timeout conditions, but only the persist cause was saved in the connection end info status, which in turn is logged in the blackbox "connection end" event.
Reviewed by: tuexen MFC after: 3 days Sponsored by: Netflix, Inc.
show more ...
|
| #
15c991fd
|
| 24-Jul-2025 |
Nick Banks <nickbanks@netflix.com> |
tcp: remove trailing whitespaces
Reviewed by: cc, tuexen, Peter Lei Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D51437
|
| #
5fb4b091
|
| 26-Jun-2025 |
Michael Tuexen <tuexen@FreeBSD.org> |
tcp: allow specifying a MSL for local communications
When setting the sysctl-variable net.inet.tcp.nolocaltimewait to 1, which is the default, a TCP endpoint does not enter the TIME-WAIT state, when
tcp: allow specifying a MSL for local communications
When setting the sysctl-variable net.inet.tcp.nolocaltimewait to 1, which is the default, a TCP endpoint does not enter the TIME-WAIT state, when the communication is local. This can result in sending RST-segments without any error situation. By setting the sysctl-variable net.inet.tcp.nolocaltimewait to 0, this does not occur, and the behavior is compliant with the TCP specification. But there is no reason to stay in the TIME-WAIT state for two times the value of the sysctl-variable net.inet.tcp.msl, if the communication is local. Therefore provide a separate sysctl-variable net.inet.tcp.msl_local, which controls how long an TCP end-point stays in the TIME-WAIT state, if the communication is local. The default value is 10 ms.
Reviewed by: glebius, Peter Lei Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D50637
show more ...
|
| #
552d1780
|
| 17-Jun-2025 |
Gleb Smirnoff <glebius@FreeBSD.org> |
tcp: provide sysctl for the maximum retransmission timeout
Reviewed by: tuexen Differential Revision: https://reviews.freebsd.org/D50891
|
| #
faa6aa77
|
| 17-Jun-2025 |
Gleb Smirnoff <glebius@FreeBSD.org> |
tcp: remove CTLFLAG_NEEDGIANT from sysctl(9) handlers related to timers
They all just modify a global word sized variable via sysctl_msec_to_ticks(), which is wrapper around sysctl_handle_int(). Not
tcp: remove CTLFLAG_NEEDGIANT from sysctl(9) handlers related to timers
They all just modify a global word sized variable via sysctl_msec_to_ticks(), which is wrapper around sysctl_handle_int(). Note, they all were marked with CTLFLAG_NEEDGIANT in 7029da5c36f2 merely because it was not obvious whether they are mpsafe or not.
Reviewed by: tuexen Differential Revision: https://reviews.freebsd.org/D50890
show more ...
|
| #
625835c8
|
| 05-Nov-2024 |
Michael Tuexen <tuexen@FreeBSD.org> |
tcp: fix the initial CWND when a SYN retransmission happened
According to RFC 3390 the CWND should be set to one MSS if the SYN or SYN-ACK has been retransmitted. This is handled in the code by sett
tcp: fix the initial CWND when a SYN retransmission happened
According to RFC 3390 the CWND should be set to one MSS if the SYN or SYN-ACK has been retransmitted. This is handled in the code by setting CWND to 1 and cc_conn_init() translates this to MSS. Unfortunately, cc_cong_signal() was overwriting the special value of 1 in case of a lost SYN, and therefore the initial CWND was not as it was supposed to be. Fix this by not overwriting the special value of 1.
Reviewed by: cc, rscheff MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D47439
show more ...
|
| #
d021d3b3
|
| 24-Oct-2024 |
Gleb Smirnoff <glebius@FreeBSD.org> |
tcp: get rid of TDP_INTCPCALLOUT
With CALLOUT_TRYLOCK we don't need this special flag.
Reviewed by: jtl Differential Revision: https://reviews.freebsd.org/D45748
|
| #
bffebc33
|
| 24-Oct-2024 |
Gleb Smirnoff <glebius@FreeBSD.org> |
tcp: use CALLOUT_TRYLOCK for the TCP callout
This allows to remove the drop of the lock tcp_timer_enter(), which closes a sophisticated but possible race that involves three threads. In case we got
tcp: use CALLOUT_TRYLOCK for the TCP callout
This allows to remove the drop of the lock tcp_timer_enter(), which closes a sophisticated but possible race that involves three threads. In case we got a callout executing and two threads trying to close the connection, e.g. and interrupt and a syscall, then lock yielding in tcp_timer_enter() may transfer lock from one closing thread to the other closing thread, instead of the callout.
Reviewed by: jtl Differential Revision: https://reviews.freebsd.org/D45747
show more ...
|
| #
fce03f85
|
| 05-May-2024 |
Randall Stewart <rrs@FreeBSD.org> |
TCP can be subject to Sack Attacks lets fix this issue.
There is a type of attack that a TCP peer can launch on a connection. This is for sure in Rack or BBR and probably even the default stack if i
TCP can be subject to Sack Attacks lets fix this issue.
There is a type of attack that a TCP peer can launch on a connection. This is for sure in Rack or BBR and probably even the default stack if it uses lists in sack processing. The idea of the attack is that the attacker is driving you to look at 100's of sack blocks that only update 1 byte. So for example if you have 1 - 10,000 bytes outstanding the attacker sends in something like:
ACK 0 SACK(1-512) SACK(1024 - 1536), SACK(2048-2536), SACK(4096 - 4608), SACK(8192-8704) This first sack looks fine but then the attacker sends
ACK 0 SACK(1-512) SACK(1025 - 1537), SACK(2049-2537), SACK(4097 - 4609), SACK(8193-8705) ACK 0 SACK(1-512) SACK(1027 - 1539), SACK(2051-2539), SACK(4099 - 4611), SACK(8195-8707) ... These blocks are making you hunt across your linked list and split things up so that you have an entry for every other byte. Has your list grows you spend more and more CPU running through the lists. The idea here is the attacker chooses entries as far apart as possible that make you run through the list. This example is small but in theory if the window is open to say 1Meg you could end up with 100's of thousands link list entries.
To combat this we introduce three things.
when the peer requests a very small MSS we stop processing SACK's from them. This prevents a malicious peer from just using a small MSS to do the same thing. Any time we get a sack block, we use the sack-filter to remove sacks that are smaller than the smallest v4 mss (minus 40 for max TCP options) unless it ties up to snd_max (since that is legal). All other sacks in theory should be at least an MSS. If we get such an attacker that means we basically start skipping all but MSS sized Sacked blocks. The sack filter used to throw away data when its bounds were exceeded, instead now we increase its size to 15 and then throw away sack's if the filter gets over-run to prevent the malicious attacker from over-running the sack filter and thus we start to process things anyway. The default stack will need to start using the sack-filter which we have talked about in past conference calls to take full advantage of the protections offered by it (and reduce cpu consumption when processing sacks).
After this set of changes is in rack can drop its SAD detection completely
Reviewed by:tuexen@, rscheff@ Differential Revision: <https://reviews.freebsd.org/D44903>
show more ...
|
| #
e34ea019
|
| 18-Mar-2024 |
Gleb Smirnoff <glebius@FreeBSD.org> |
tcp: clear all TCP timers in tcp_timer_stop() when in callout
When a TCP callout decides to disable self, e.g. tcp_timer_2msl() calling tcp_close(), we must also clear all other possible timers. Ot
tcp: clear all TCP timers in tcp_timer_stop() when in callout
When a TCP callout decides to disable self, e.g. tcp_timer_2msl() calling tcp_close(), we must also clear all other possible timers. Otherwise, upon return, the callout would be scheduled again in tcp_timer_enter().
Revert 57e27ff07aff, which was a temporary partial revert of otherwise correct 62d47d73b7eb, that exposed the problem being fixed now. Add an extra assertion in tcp_timer_enter() to check we aren't arming callout for a closed connection.
Reviewed by: rscheff
show more ...
|
| #
62d47d73
|
| 10-Feb-2024 |
Richard Scheffenegger <rscheff@FreeBSD.org> |
tcp: stop timers and clean scoreboard in tcp_close()
Stop timers when in tcp_close() instead of doing that in tcp_discardcb(). A connection in CLOSED state shall not need any timers. Assert that no
tcp: stop timers and clean scoreboard in tcp_close()
Stop timers when in tcp_close() instead of doing that in tcp_discardcb(). A connection in CLOSED state shall not need any timers. Assert that no timer is rescheduled after that in tcp_timer_activate() and verfiy that this is also the expected state in tcp_discardcb().
PR: 276761 Reviewed By: glebius, tuexen, #transport Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D43792
show more ...
|
| #
e21c6687
|
| 23-Jan-2024 |
Gleb Smirnoff <glebius@FreeBSD.org> |
tcp: pass positive errno to tcp_drop()
Fixes: 446ccdd08e2a9f704f6348cd7f679e59183b99b3
|
| #
30409ecd
|
| 06-Jan-2024 |
Richard Scheffenegger <rscheff@FreeBSD.org> |
tcp: do not purge SACK scoreboard on first RTO
Keeping the SACK scoreboard intact after the first RTO and retransmitting all data anew only on subsequent RTOs allows a more timely and efficient loss
tcp: do not purge SACK scoreboard on first RTO
Keeping the SACK scoreboard intact after the first RTO and retransmitting all data anew only on subsequent RTOs allows a more timely and efficient loss recovery under many adverse cirumstances.
Reviewed By: tuexen, #transport MFC after: 10 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D42906
show more ...
|
| #
29363fb4
|
| 23-Nov-2023 |
Warner Losh <imp@FreeBSD.org> |
sys: Remove ancient SCCS tags.
Remove ancient SCCS tags from the tree, automated scripting, with two minor fixup to keep things compiling. All the common forms in the tree were removed with a perl s
sys: Remove ancient SCCS tags.
Remove ancient SCCS tags from the tree, automated scripting, with two minor fixup to keep things compiling. All the common forms in the tree were removed with a perl script.
Sponsored by: Netflix
show more ...
|
| #
685dc743
|
| 16-Aug-2023 |
Warner Losh <imp@FreeBSD.org> |
sys: Remove $FreeBSD$: one-line .c pattern
Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
|
| #
43b117f8
|
| 06-Jun-2023 |
Richard Scheffenegger <rscheff@FreeBSD.org> |
tcp: make the maximum number of retransmissions tunable per VNET
Both Windows (TcpMaxDataRetransmissions) and Linux (tcp_retries2) allow to restrict the maximum number of consecutive timer based ret
tcp: make the maximum number of retransmissions tunable per VNET
Both Windows (TcpMaxDataRetransmissions) and Linux (tcp_retries2) allow to restrict the maximum number of consecutive timer based retransmissions. Add that same capability on a per-VNet basis to FreeBSD.
Reviewed By: cc, tuexen, #transport Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D40424
show more ...
|
| #
69c7c811
|
| 16-Mar-2023 |
Randall Stewart <rrs@FreeBSD.org> |
Move access to tcp's t_logstate into inline functions and provide new tracepoint and bbpoint capabilities.
The TCP stacks have long accessed t_logstate directly, but in order to do tracepoints and t
Move access to tcp's t_logstate into inline functions and provide new tracepoint and bbpoint capabilities.
The TCP stacks have long accessed t_logstate directly, but in order to do tracepoints and the new bbpoints we need to move to using the new inline functions. This adds them and moves rack to now use the tcp_tracepoints.
Reviewed by: tuexen, gallatin Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D38831
show more ...
|
| #
76578d60
|
| 21-Feb-2023 |
Michael Tuexen <tuexen@FreeBSD.org> |
bblog: improve timeout event handling
Extend the BBLog RTO event to deal with all timers of the base stack. Also provide information about starting, stopping, and running off. The expiration of the
bblog: improve timeout event handling
Extend the BBLog RTO event to deal with all timers of the base stack. Also provide information about starting, stopping, and running off. The expiration of the retransmission timer is reported as it was done before.
Reviewed by: rscheff@ Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D38710
show more ...
|
| #
eaabc937
|
| 14-Dec-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
tcp: retire TCPDEBUG
This subsystem is superseded by modern debugging facilities, e.g. DTrace probes and TCP black box logging.
We intentionally leave SO_DEBUG in place, as many utilities may set i
tcp: retire TCPDEBUG
This subsystem is superseded by modern debugging facilities, e.g. DTrace probes and TCP black box logging.
We intentionally leave SO_DEBUG in place, as many utilities may set it on a socket. Also the tcp::debug DTrace probes look at this flag on a socket.
Reviewed by: gnn, tuexen Discussed with: rscheff, rrs, jtl Differential revision: https://reviews.freebsd.org/D37694
show more ...
|
| #
446ccdd0
|
| 07-Dec-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
tcp: use single locked callout per tcpcb for the TCP timers
Use only one callout structure per tcpcb that is responsible for handling all five TCP timeouts. Use locked version of callout, of course
tcp: use single locked callout per tcpcb for the TCP timers
Use only one callout structure per tcpcb that is responsible for handling all five TCP timeouts. Use locked version of callout, of course. The callout function tcp_timer_enter() chooses soonest timer and executes it with lock held. Unless the timer reports that the tcpcb has been freed, the callout is rescheduled for next soonest timer, if there is any.
With single callout per tcpcb on connection teardown we should be able to fully stop the callout and immediately free it, avoiding use of callout_async_drain(). There is one gotcha here: callout_stop() can actually touch our memory when a rare race condition happens. See comment above tcp_timer_stop(). Synchronous stop of the callout makes tcp_discardcb() the single entry point for tcpcb destructor, merging the tcp_freecb() to the end of the function.
While here, also remove lots of lingering checks in the beginning of TCP timer functions. With a locked callout they are unnecessary.
While here, clean unused parts of timer KPI for the pluggable TCP stacks.
While here, remove TCPDEBUG from tcp_timer.c, as this allows for more simplification of TCP timers. The TCPDEBUG is scheduled for removal.
Move the DTrace probes in timers to the beginning of a function, where a tcpcb is always existing.
Discussed with: rrs, tuexen, rscheff (the TCP part of the diff) Reviewed by: hselasky, kib, mav (the callout part) Differential revision: https://reviews.freebsd.org/D37321
show more ...
|
| #
918fa422
|
| 07-Dec-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
tcp: remove tcp_timer_suspend()
It was a temporary code added together with RACK to fight against TCP timer races.
|
| #
e68b3792
|
| 07-Dec-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
tcp: embed inpcb into tcpcb
For the TCP protocol inpcb storage specify allocation size that would provide space to most of the data a TCP connection needs, embedding into struct tcpcb several struct
tcp: embed inpcb into tcpcb
For the TCP protocol inpcb storage specify allocation size that would provide space to most of the data a TCP connection needs, embedding into struct tcpcb several structures, that previously were allocated separately.
The most import one is the inpcb itself. With embedding we can provide strong guarantee that with a valid TCP inpcb the tcpcb is always valid and vice versa. Also we reduce number of allocs/frees per connection. The embedded inpcb is placed in the beginning of the struct tcpcb, since in_pcballoc() requires that. However, later we may want to move it around for cache line efficiency, and this can be done with a little effort. The new intotcpcb() macro is ready for such move.
The congestion algorithm data, the TCP timers and osd(9) data are also embedded into tcpcb, and temprorary struct tcpcb_mem goes away. There was no extra allocation here, but we went through extra pointer every time we accessed this data.
One interesting side effect is that now TCP data is allocated from SMR-protected zone. Potentially this allows the TCP stacks or other TCP related modules to utilize that for their own synchronization.
Large part of the change was done with sed script:
s/tp->ccv->/tp->t_ccv./g s/tp->ccv/\&tp->t_ccv/g s/tp->cc_algo/tp->t_cc/g s/tp->t_timers->tt_/tp->tt_/g s/CCV\(ccv, osd\)/\&CCV(ccv, t_osd)/g
Dependency side effect is that code that needs to know struct tcpcb should also know struct inpcb, that added several <netinet/in_pcb.h>.
Differential revision: https://reviews.freebsd.org/D37127
show more ...
|
| #
b40ae8c9
|
| 08-Nov-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
tcp: fix build without INVARIANTS and VIMAGE
Lines from upcoming changes crept in and broke certain builds.
Fixes: 9eb0e8326d0fe73ae947959c1df327238d3b2d53
|
| #
8840ae22
|
| 08-Nov-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
tcp: don't store VNET in every tcpcb, take it from the inpcbinfo
Reviewed by: rscheff Differential revision: https://reviews.freebsd.org/D37125
|