| #
73bebcc5
|
| 08-Nov-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
inpcb: remove TCP includes, all TCP specific code was moved
|
| #
ab0ef945
|
| 08-Nov-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
hpts: move inp initialization from the generic inpcb code to TCP
Differential revision: https://reviews.freebsd.org/D37124
|
| #
f567d55f
|
| 08-Nov-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
inpcb: don't return INP_DROPPED entries from pcb lookups
The in_pcbdrop() KPI, which is used solely by TCP, allows to remove a pcb from hash list and mark it as dropped. The comment suggests that s
inpcb: don't return INP_DROPPED entries from pcb lookups
The in_pcbdrop() KPI, which is used solely by TCP, allows to remove a pcb from hash list and mark it as dropped. The comment suggests that such pcb won't be returned by lookups. Indeed, every call to in_pcblookup*() is accompanied by a check for INP_DROPPED. Do what comment suggests: never return such pcbs and remove unnecessary checks.
Reviewed by: tuexen Differential revision: https://reviews.freebsd.org/D37061
show more ...
|
| #
d93ec8cb
|
| 02-Nov-2022 |
Mark Johnston <markj@FreeBSD.org> |
inpcb: Allow SO_REUSEPORT_LB to be used in jails
Currently SO_REUSEPORT_LB silently does nothing when set by a jailed process. It is trivial to support this option in VNET jails, but it's also usef
inpcb: Allow SO_REUSEPORT_LB to be used in jails
Currently SO_REUSEPORT_LB silently does nothing when set by a jailed process. It is trivial to support this option in VNET jails, but it's also useful in traditional jails.
This patch enables LB groups in jails with the following semantics: - all PCBs in a group must belong to the same jail, - PCB lookup prefers jailed groups to non-jailed groups
This is a straightforward extension of the semantics used for individual listening sockets. One pre-existing quirk of the lbgroup implementation is that non-jailed lbgroups are searched before jailed listening sockets; that is preserved with this change.
Discussed with: glebius MFC after: 1 month Sponsored by: Modirum MDPay Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D37029
show more ...
|
| #
a152dd86
|
| 02-Nov-2022 |
Mark Johnston <markj@FreeBSD.org> |
inpcb: Remove a PCB from its LB group upon a subsequent error
If a memory allocation failure causes bind to fail, we should take the inpcb back out of its LB group since it's not prepared to handle
inpcb: Remove a PCB from its LB group upon a subsequent error
If a memory allocation failure causes bind to fail, we should take the inpcb back out of its LB group since it's not prepared to handle connections.
Reviewed by: glebius MFC after: 2 weeks Sponsored by: Modirum MDPay Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D37027
show more ...
|
| #
ac1750dd
|
| 02-Nov-2022 |
Mark Johnston <markj@FreeBSD.org> |
inpcb: Remove NULL checks of credential references
Some auditing of the code shows that "cred" is never non-NULL in these functions, either because all callers pass a non-NULL reference or because t
inpcb: Remove NULL checks of credential references
Some auditing of the code shows that "cred" is never non-NULL in these functions, either because all callers pass a non-NULL reference or because they unconditionally dereference "cred". So, let's simplify the code a bit and remove NULL checks. No functional change intended.
Reviewed by: glebius MFC after: 1 week Sponsored by: Modirum MDPay Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D37025
show more ...
|
| #
19acc506
|
| 31-Oct-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
inpcb: retire suppresion of randomization of ephemeral ports
The suppresion was added in 5f311da2ccb with no explanation in the commit message of the exact problem that was fixed. In the BSDCan 2006
inpcb: retire suppresion of randomization of ephemeral ports
The suppresion was added in 5f311da2ccb with no explanation in the commit message of the exact problem that was fixed. In the BSDCan 2006 talk [1], slides 12 to 14, we can find that it seems that there was some problem with the TIME_WAIT state not properly being handled on the remote side (also FreeBSD!), and this switching off the suppression had hidden the problem. The rationale of the change was that other stacks may also be buggy wrt the TIME_WAIT.
I did not find the actual problem in TIME_WAIT that the suppression has hidden, neither a commit that would fix it. However, since that time we started to handle SYNs with RFC5961 instead of RFC793, see 3220a2121cc. We also now have the tcp-testsuite [2], that has full coverage of all possible scenarios of receiving SYN in TIME_WAIT.
This effectively reverts 5f311da2ccb6c216b79049172be840af4778129a and 6ee79c59d2c081f837a11cc78908be54a6dbe09d.
[1] https://www.bsdcan.org/2006/papers/ImprovingTCPIP.pdf [2] https://github.com/freebsd-net/tcp-testsuite
Reviewed by: rscheff Discussed with: rscheff, rrs, tuexen Differential revision: https://reviews.freebsd.org/D37042
show more ...
|
| #
24cf7a8d
|
| 19-Oct-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
inpcb: provide pcbinfo pointer argument to inp_apply_all()
Allows to clear inpcb layer of TCP knowledge.
|
| #
b6a816f1
|
| 19-Oct-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
inpcb: garbage collect so_sototcpcb()
It had very little use and required inpcb layer to know tcpcb.
|
| #
3ba34b07
|
| 13-Oct-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
inpcb: provide in_pcbremhash() to reduce copy-paste
|
| #
53af6903
|
| 07-Oct-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
tcp: remove INP_TIMEWAIT flag
Mechanically cleanup INP_TIMEWAIT from the kernel sources. After 0d7445193ab, this commit shall not cause any functional changes.
Note: this flag was very often check
tcp: remove INP_TIMEWAIT flag
Mechanically cleanup INP_TIMEWAIT from the kernel sources. After 0d7445193ab, this commit shall not cause any functional changes.
Note: this flag was very often checked together with INP_DROPPED. If we modify in_pcblookup*() not to return INP_DROPPED pcbs, we will be able to remove most of this checks and turn them to assertions. Some of them can be turned into assertions right now, but that should be carefully done on a case by case basis.
Differential revision: https://reviews.freebsd.org/D36400
show more ...
|
| #
0d744519
|
| 07-Oct-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
tcp: remove tcptw, the compressed timewait state structure
The memory savings the tcptw brought back in 2003 (see 340c35de6a2) no longer justify the complexity required to maintain it. For longer e
tcp: remove tcptw, the compressed timewait state structure
The memory savings the tcptw brought back in 2003 (see 340c35de6a2) no longer justify the complexity required to maintain it. For longer explanation please check out the email [1].
Surpisingly through almost 20 years the TCP stack functionality of handling the TIME_WAIT state with a normal tcpcb did not bitrot. The existing tcp_input() properly handles a tcpcb in TCPS_TIME_WAIT state, which is confirmed by the packetdrill tcp-testsuite [2].
This change just removes tcptw and leaves INP_TIMEWAIT. The flag will be removed in a separate commit. This makes it easier to review and possibly debug the changes.
[1] https://lists.freebsd.org/archives/freebsd-net/2022-January/001206.html [2] https://github.com/freebsd-net/tcp-testsuite
Differential revision: https://reviews.freebsd.org/D36398
show more ...
|
| #
c7a62c92
|
| 10-Aug-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
inpcb: gather v4/v6 handling code into in_pcballoc() from protocols
Reviewed by: rrs, tuexen Differential revision: https://reviews.freebsd.org/D36062
|
| #
637f317c
|
| 29-Jul-2022 |
Mike Karels <karels@FreeBSD.org> |
IPv6: fix problem with duplicate port assignment with v4-mapped addrs
In in_pcb_lport_dest(), if an IPv6 socket does not match any other IPv6 socket using in6_pcblookup_local(), and if the socket ca
IPv6: fix problem with duplicate port assignment with v4-mapped addrs
In in_pcb_lport_dest(), if an IPv6 socket does not match any other IPv6 socket using in6_pcblookup_local(), and if the socket can also connect to IPv4 (the INP_IPV4 vflag is set), check for IPv4 matches as well. Otherwise, we can allocate a port that is used by an IPv4 socket (possibly one created from IPv6 via the same procedure), and then connect() can fail with EADDRINUSE, when it could have succeeded if the bound port was not in use.
PR: 265064 Submitted by: firk at cantconnect.ru (with modifications) Reviewed by: bz, melifaro Differential Revision: https://reviews.freebsd.org/D36012
show more ...
|
| #
fe5324ac
|
| 13-Apr-2022 |
John Baldwin <jhb@FreeBSD.org> |
in_pcballoc: error is only used for IPSEC or MAC.
|
| #
bab34d63
|
| 12-Apr-2022 |
John Baldwin <jhb@FreeBSD.org> |
in_pcboutput_txrtlmt: Remove unused variable.
|
| #
942e8cab
|
| 02-Apr-2022 |
Gordon Bergling <gbe@FreeBSD.org> |
netinet: Fix a typo in a source code comment
- s/exisitng/existing/
MFC after: 3 days
|
| #
a0aeb1ce
|
| 09-Feb-2022 |
Michael Tuexen <tuexen@FreeBSD.org> |
in_pcb.c: fix compilation of an IPv4 only configuration
While there, remove a duplicate inclusion of sysctl.h.
Reported by: Gary Jennejohn Fixes: a35bdd4489b9 - main - tcp: add sysctl interface fo
in_pcb.c: fix compilation of an IPv4 only configuration
While there, remove a duplicate inclusion of sysctl.h.
Reported by: Gary Jennejohn Fixes: a35bdd4489b9 - main - tcp: add sysctl interface for setting socket options Sponsored by: Netflix, Inc.
show more ...
|
| #
a35bdd44
|
| 09-Feb-2022 |
Michael Tuexen <tuexen@FreeBSD.org> |
tcp: add sysctl interface for setting socket options
This interface allows to set a socket option on a TCP endpoint, which is specified by its inp_gencnt. This interface will be used in an upcoming
tcp: add sysctl interface for setting socket options
This interface allows to set a socket option on a TCP endpoint, which is specified by its inp_gencnt. This interface will be used in an upcoming command line tool tcpsso.
Reviewed by: glebius, rrs Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D34138
show more ...
|
| #
fec8a8c7
|
| 03-Jan-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
inpcb: use global UMA zones for protocols
Provide structure inpcbstorage, that holds zones and lock names for a protocol. Initialize it with global protocol init using macro INPCBSTORAGE_DEFINE().
inpcb: use global UMA zones for protocols
Provide structure inpcbstorage, that holds zones and lock names for a protocol. Initialize it with global protocol init using macro INPCBSTORAGE_DEFINE(). Then, at VNET protocol init supply it as the main argument to the in_pcbinfo_init(). Each VNET pcbinfo uses its private hash, but they all use same zone to allocate and SMR section to synchronize.
Note: there is kern.ipc.maxsockets sysctl, which controls UMA limit on the socket zone, which was always global. Historically same maxsockets value is applied also to every PCB zone. Important fact: you can't create a pcb without a socket! A pcb may outlive its socket, however. Given that there are multiple protocols, and only one socket zone, the per pcb zone limits seem to have little value. Under very special conditions it may trigger a little bit earlier than socket zone limit, but in most setups the socket zone limit will be triggered earlier. When VIMAGE was added to the kernel PCB zones became per-VNET. This magnified existing disbalance further: now we have multiple pcb zones in multiple vnets limited to maxsockets, but every pcb requires a socket allocated from the global zone also limited by maxsockets. IMHO, this per pcb zone limit doesn't bring any value, so this patch drops it. If anybody explains value of this limit, it can be restored very easy - just 2 lines change to in_pcbstorage_init().
Differential revision: https://reviews.freebsd.org/D33542
show more ...
|
| #
430df2ab
|
| 01-Jan-2022 |
Michael Tuexen <tuexen@FreeBSD.org> |
in_pcb: improve inp_next()
If there is no inp to check, exit the loop iterating through them.
Reported by: syzbot+403406a9cbf082b36ea4@syzkaller.appspotmail.com Reviewed by: glebius Sponsored by: N
in_pcb: improve inp_next()
If there is no inp to check, exit the loop iterating through them.
Reported by: syzbot+403406a9cbf082b36ea4@syzkaller.appspotmail.com Reviewed by: glebius Sponsored by: Netflix, Inc.
show more ...
|
| #
a0577692
|
| 26-Dec-2021 |
Gleb Smirnoff <glebius@FreeBSD.org> |
in_pcb: use jenkins hash over the entire IPv6 (or IPv4) address
The intent is to provide more entropy than can be provided by just the 32-bits of the IPv6 address which overlaps with 6to4 tunnels.
in_pcb: use jenkins hash over the entire IPv6 (or IPv4) address
The intent is to provide more entropy than can be provided by just the 32-bits of the IPv6 address which overlaps with 6to4 tunnels. This is needed to mitigate potential algorithmic complexity attacks from attackers who can control large numbers of IPv6 addresses.
Together with: gallatin Reviewed by: dwmalone, rscheff Differential revision: https://reviews.freebsd.org/D33254
show more ...
|
| #
a370832b
|
| 26-Dec-2021 |
Gleb Smirnoff <glebius@FreeBSD.org> |
tcp: remove delayed drop KPI
No longer needed after tcp_output() can ask caller to drop.
Reviewed by: rrs, tuexen Differential revision: https://reviews.freebsd.org/D33371
|
| #
d8b45c8e
|
| 16-Dec-2021 |
Gleb Smirnoff <glebius@FreeBSD.org> |
inpcb: don't leak the port zone in in_pcbinfo_destroy()
|
| #
185e659c
|
| 14-Dec-2021 |
Gleb Smirnoff <glebius@FreeBSD.org> |
inpcb: use locked variant of prison_check_ip*()
The pcb lookup always happens in the network epoch and in SMR section. We can't block on a mutex due to the latter. Right now this patch opens up a r
inpcb: use locked variant of prison_check_ip*()
The pcb lookup always happens in the network epoch and in SMR section. We can't block on a mutex due to the latter. Right now this patch opens up a race. But soon that will be addressed by D33339.
Reviewed by: markj, jamie Differential revision: https://reviews.freebsd.org/D33340 Fixes: de2d47842e8
show more ...
|