#
ab93e0dd |
| 06-Aug-2025 |
Dmitry Torokhov <dmitry.torokhov@gmail.com> |
Merge branch 'next' into for-linus
Prepare input updates for 6.17 merge window.
|
#
a7bee4e7 |
| 04-Aug-2025 |
Dmitry Torokhov <dmitry.torokhov@gmail.com> |
Merge tag 'ib-mfd-gpio-input-pwm-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd into next
Merge an immutable branch between MFD, GPIO, Input and PWM to resolve conflicts for the mer
Merge tag 'ib-mfd-gpio-input-pwm-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd into next
Merge an immutable branch between MFD, GPIO, Input and PWM to resolve conflicts for the merge window pull request.
show more ...
|
#
8be4d31c |
| 30-Jul-2025 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'net-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski: "Core & protocols:
- Wrap datapath globals into net_align
Merge tag 'net-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski: "Core & protocols:
- Wrap datapath globals into net_aligned_data, to avoid false sharing
- Preserve MSG_ZEROCOPY in forwarding (e.g. out of a container)
- Add SO_INQ and SCM_INQ support to AF_UNIX
- Add SIOCINQ support to AF_VSOCK
- Add TCP_MAXSEG sockopt to MPTCP
- Add IPv6 force_forwarding sysctl to enable forwarding per interface
- Make TCP validation of whether packet fully fits in the receive window and the rcv_buf more strict. With increased use of HW aggregation a single "packet" can be multiple 100s of kB
- Add MSG_MORE flag to optimize large TCP transmissions via sockmap, improves latency up to 33% for sockmap users
- Convert TCP send queue handling from tasklet to BH workque
- Improve BPF iteration over TCP sockets to see each socket exactly once
- Remove obsolete and unused TCP RFC3517/RFC6675 loss recovery code
- Support enabling kernel threads for NAPI processing on per-NAPI instance basis rather than a whole device. Fully stop the kernel NAPI thread when threaded NAPI gets disabled. Previously thread would stick around until ifdown due to tricky synchronization
- Allow multicast routing to take effect on locally-generated packets
- Add output interface argument for End.X in segment routing
- MCTP: add support for gateway routing, improve bind() handling
- Don't require rtnl_lock when fetching an IPv6 neighbor over Netlink
- Add a new neighbor flag ("extern_valid"), which cedes refresh responsibilities to userspace. This is needed for EVPN multi-homing where a neighbor entry for a multi-homed host needs to be synced across all the VTEPs among which the host is multi-homed
- Support NUD_PERMANENT for proxy neighbor entries
- Add a new queuing discipline for IETF RFC9332 DualQ Coupled AQM
- Add sequence numbers to netconsole messages. Unregister netconsole's console when all net targets are removed. Code refactoring. Add a number of selftests
- Align IPSec inbound SA lookup to RFC 4301. Only SPI and protocol should be used for an inbound SA lookup
- Support inspecting ref_tracker state via DebugFS
- Don't force bonding advertisement frames tx to ~333 ms boundaries. Add broadcast_neighbor option to send ARP/ND on all bonded links
- Allow providing upcall pid for the 'execute' command in openvswitch
- Remove DCCP support from Netfilter's conntrack
- Disallow multiple packet duplications in the queuing layer
- Prevent use of deprecated iptables code on PREEMPT_RT
Driver API:
- Support RSS and hashing configuration over ethtool Netlink
- Add dedicated ethtool callbacks for getting and setting hashing fields
- Add support for power budget evaluation strategy in PSE / Power-over-Ethernet. Generate Netlink events for overcurrent etc
- Support DPLL phase offset monitoring across all device inputs. Support providing clock reference and SYNC over separate DPLL inputs
- Support traffic classes in devlink rate API for bandwidth management
- Remove rtnl_lock dependency from UDP tunnel port configuration
Device drivers:
- Add a new Broadcom driver for 800G Ethernet (bnge)
- Add a standalone driver for Microchip ZL3073x DPLL
- Remove IBM's NETIUCV device driver
- Ethernet high-speed NICs: - Broadcom (bnxt): - support zero-copy Tx of DMABUF memory - take page size into account for page pool recycling rings - Intel (100G, ice, idpf): - idpf: XDP and AF_XDP support preparations - idpf: add flow steering - add link_down_events statistic - clean up the TSPLL code - preparations for live VM migration - nVidia/Mellanox: - support zero-copy Rx/Tx interfaces (DMABUF and io_uring) - optimize context memory usage for matchers - expose serial numbers in devlink info - support PCIe congestion metrics - Meta (fbnic): - add 25G, 50G, and 100G link modes to phylink - support dumping FW logs - Marvell/Cavium: - support for CN20K generation of the Octeon chips - Amazon: - add HW clock (without timestamping, just hypervisor time access)
- Ethernet virtual: - VirtIO net: - support segmentation of UDP-tunnel-encapsulated packets - Google (gve): - support packet timestamping and clock synchronization - Microsoft vNIC: - add handler for device-originated servicing events - allow dynamic MSI-X vector allocation - support Tx bandwidth clamping
- Ethernet NICs consumer, and embedded: - AMD: - amd-xgbe: hardware timestamping and PTP clock support - Broadcom integrated MACs (bcmgenet, bcmasp): - use napi_complete_done() return value to support NAPI polling - add support for re-starting auto-negotiation - Broadcom switches (b53): - support BCM5325 switches - add bcm63xx EPHY power control - Synopsys (stmmac): - lots of code refactoring and cleanups - TI: - icssg-prueth: read firmware-names from device tree - icssg: PRP offload support - Microchip: - lan78xx: convert to PHYLINK for improved PHY and MAC management - ksz: add KSZ8463 switch support - Intel: - support similar queue priority scheme in multi-queue and time-sensitive networking (taprio) - support packet pre-emption in both - RealTek (r8169): - enable EEE at 5Gbps on RTL8126 - Airoha: - add PPPoE offload support - MDIO bus controller for Airoha AN7583
- Ethernet PHYs: - support for the IPQ5018 internal GE PHY - micrel KSZ9477 switch-integrated PHYs: - add MDI/MDI-X control support - add RX error counters - add cable test support - add Signal Quality Indicator (SQI) reporting - dp83tg720: improve reset handling and reduce link recovery time - support bcm54811 (and its MII-Lite interface type) - air_en8811h: support resume/suspend - support PHY counters for QCA807x and QCA808x - support WoL for QCA807x
- CAN drivers: - rcar_canfd: support for Transceiver Delay Compensation - kvaser: report FW versions via devlink dev info
- WiFi: - extended regulatory info support (6 GHz) - add statistics and beacon monitor for Multi-Link Operation (MLO) - support S1G aggregation, improve S1G support - add Radio Measurement action fields - support per-radio RTS threshold - some work around how FIPS affects wifi, which was wrong (RC4 is used by TKIP, not only WEP) - improvements for unsolicited probe response handling
- WiFi drivers: - RealTek (rtw88): - IBSS mode for SDIO devices - RealTek (rtw89): - BT coexistence for MLO/WiFi7 - concurrent station + P2P support - support for USB devices RTL8851BU/RTL8852BU - Intel (iwlwifi): - use embedded PNVM in (to be released) FW images to fix compatibility issues - many cleanups (unused FW APIs, PCIe code, WoWLAN) - some FIPS interoperability - MediaTek (mt76): - firmware recovery improvements - more MLO work - Qualcomm/Atheros (ath12k): - fix scan on multi-radio devices - more EHT/Wi-Fi 7 features - encapsulation/decapsulation offload - Broadcom (brcm80211): - support SDIO 43751 device
- Bluetooth: - hci_event: add support for handling LE BIG Sync Lost event - ISO: add socket option to report packet seqnum via CMSG - ISO: support SCM_TIMESTAMPING for ISO TS
- Bluetooth drivers: - intel_pcie: support Function Level Reset - nxpuart: add support for 4M baudrate - nxpuart: implement powerup sequence, reset, FW dump, and FW loading"
* tag 'net-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1742 commits) dpll: zl3073x: Fix build failure selftests: bpf: fix legacy netfilter options ipv6: annotate data-races around rt->fib6_nsiblings ipv6: fix possible infinite loop in fib6_info_uses_dev() ipv6: prevent infinite loop in rt6_nlmsg_size() ipv6: add a retry logic in net6_rt_notify() vrf: Drop existing dst reference in vrf_ip6_input_dst net/sched: taprio: align entry index attr validation with mqprio net: fsl_pq_mdio: use dev_err_probe selftests: rtnetlink.sh: remove esp4_offload after test vsock: remove unnecessary null check in vsock_getname() igb: xsk: solve negative overflow of nb_pkts in zerocopy mode stmmac: xsk: fix negative overflow of budget in zerocopy mode dt-bindings: ieee802154: Convert at86rf230.txt yaml format net: dsa: microchip: Disable PTP function of KSZ8463 net: dsa: microchip: Setup fiber ports for KSZ8463 net: dsa: microchip: Write switch MAC address differently for KSZ8463 net: dsa: microchip: Use different registers for KSZ8463 net: dsa: microchip: Add KSZ8463 switch support to KSZ DSA driver dt-bindings: net: dsa: microchip: Add KSZ8463 switch support ...
show more ...
|
Revision tags: v6.16 |
|
#
c58c18be |
| 26-Jul-2025 |
Jakub Kicinski <kuba@kernel.org> |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Merge in late fixes to prepare for the 6.17 net-next PR.
Conflicts:
net/core/neighbour.c 1bbb76a89948 ("neighbour: Fix null-ptr-der
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Merge in late fixes to prepare for the 6.17 net-next PR.
Conflicts:
net/core/neighbour.c 1bbb76a89948 ("neighbour: Fix null-ptr-deref in neigh_flush_dev().") 13a936bb99fb ("neighbour: Protect tbl->phash_buckets[] with a dedicated mutex.") 03dc03fa0432 ("neighbor: Add NTF_EXT_VALIDATED flag for externally validated entries")
Adjacent changes:
drivers/net/usb/usbnet.c 0d9cfc9b8cb1 ("net: usbnet: Avoid potential RCU stall on LINK_CHANGE event") 2c04d279e857 ("net: usb: Convert tasklet API to new bottom half workqueue mechanism")
net/ipv6/route.c 31d7d67ba127 ("ipv6: annotate data-races around rt->fib6_nsiblings") 1caf27297215 ("ipv6: adopt dst_dev() helper") 3b3ccf9ed05e ("net: Remove unnecessary NULL check for lwtunnel_fill_encap()")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
4fc7885c |
| 25-Jul-2025 |
Jakub Kicinski <kuba@kernel.org> |
Merge branch 'mlx5e-misc-fixes-2025-07-23'
Tariq Toukan says:
==================== mlx5e misc fixes 2025-07-23
This small patchset provides misc bug fixes from the team to the mlx5e driver. ======
Merge branch 'mlx5e-misc-fixes-2025-07-23'
Tariq Toukan says:
==================== mlx5e misc fixes 2025-07-23
This small patchset provides misc bug fixes from the team to the mlx5e driver. ====================
Link: https://patch.msgid.link/1753256672-337784-1-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
e80d6556 |
| 23-Jul-2025 |
Shahar Shitrit <shshitrit@nvidia.com> |
net/mlx5e: Fix potential deadlock by deferring RX timeout recovery
mlx5e_reporter_rx_timeout() is currently invoked synchronously in the driver's open error flow. This causes the thread holding priv
net/mlx5e: Fix potential deadlock by deferring RX timeout recovery
mlx5e_reporter_rx_timeout() is currently invoked synchronously in the driver's open error flow. This causes the thread holding priv->state_lock to attempt acquiring the devlink lock, which can result in a circular dependency with other devlink operations.
For example:
- Devlink health diagnose flow: - __devlink_nl_pre_doit() acquires the devlink lock. - devlink_nl_health_reporter_diagnose_doit() invokes the driver's diagnose callback. - mlx5e_rx_reporter_diagnose() then attempts to acquire priv->state_lock.
- Driver open flow: - mlx5e_open() acquires priv->state_lock. - If an error occurs, devlink_health_reporter may be called, attempting to acquire the devlink lock.
To prevent this circular locking scenario, defer the RX timeout recovery by scheduling it via a workqueue. This ensures that the recovery work acquires locks in a consistent order: first the devlink lock, then priv->state_lock.
Additionally, make the recovery work acquire the netdev instance lock to safely synchronize with the open/close channel flows, similar to mlx5e_tx_timeout_work. Repeatedly attempt to acquire the netdev instance lock until it is taken or the target RQ is no longer active, as indicated by the MLX5E_STATE_CHANNELS_ACTIVE bit.
Fixes: 32c57fb26863 ("net/mlx5e: Report and recover from rx timeout") Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1753256672-337784-4-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
fe09560f |
| 23-Jul-2025 |
Bjorn Helgaas <bhelgaas@google.com> |
net: Fix typos
Fix typos in comments and error messages.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: David Arinzon <darinzon@amazon.com> Reviewed-by: Simon Horman <horms@kernel.
net: Fix typos
Fix typos in comments and error messages.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: David Arinzon <darinzon@amazon.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250723201528.2908218-1-helgaas@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
8839d1cc |
| 23-Jul-2025 |
Jakub Kicinski <kuba@kernel.org> |
Merge branch 'net-mlx5-misc-changes-2025-07-21'
Tariq Toukan says:
==================== net/mlx5: misc changes 2025-07-21
This series by Lama contains misc enhancements to the SHAMPO parameters. =
Merge branch 'net-mlx5-misc-changes-2025-07-21'
Tariq Toukan says:
==================== net/mlx5: misc changes 2025-07-21
This series by Lama contains misc enhancements to the SHAMPO parameters. ====================
Link: https://patch.msgid.link/1753081999-326247-1-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
eeaf1146 |
| 21-Jul-2025 |
Lama Kayal <lkayal@nvidia.com> |
net/mlx5e: Remove duplicate mkey from SHAMPO header
SHAMPO structure holds two variations of the mkey, which is unnecessary, a duplication that's repeated per rq.
Remove duplicate mkey information
net/mlx5e: Remove duplicate mkey from SHAMPO header
SHAMPO structure holds two variations of the mkey, which is unnecessary, a duplication that's repeated per rq.
Remove duplicate mkey information and keep only one version, the one used in the fast path, rename field to reflect field type clearly.
Signed-off-by: Lama Kayal <lkayal@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/1753081999-326247-4-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
eee529c0 |
| 21-Jul-2025 |
Lama Kayal <lkayal@nvidia.com> |
net/mlx5e: SHAMPO, Remove mlx5e_shampo_get_log_hd_entry_size()
Refactor mlx5e_shampo_get_log_hd_entry_size() as macro, for more simplicity.
Signed-off-by: Lama Kayal <lkayal@nvidia.com> Reviewed-by
net/mlx5e: SHAMPO, Remove mlx5e_shampo_get_log_hd_entry_size()
Refactor mlx5e_shampo_get_log_hd_entry_size() as macro, for more simplicity.
Signed-off-by: Lama Kayal <lkayal@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/1753081999-326247-3-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
bc2d44b8 |
| 21-Jul-2025 |
Lama Kayal <lkayal@nvidia.com> |
net/mlx5e: SHAMPO, Cleanup reservation size formula
The reservation size formula can be reduced to a simple evaluation of MLX5E_SHAMPO_WQ_RESRV_SIZE. This leaves mlx5e_shampo_get_log_rsrv_size() wit
net/mlx5e: SHAMPO, Cleanup reservation size formula
The reservation size formula can be reduced to a simple evaluation of MLX5E_SHAMPO_WQ_RESRV_SIZE. This leaves mlx5e_shampo_get_log_rsrv_size() with one single use, which can be replaced with a macro for simplicity.
Also, function mlx5e_shampo_get_log_rsrv_size() is used only throughout params.c, make it static.
Signed-off-by: Lama Kayal <lkayal@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/1753081999-326247-2-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
Revision tags: v6.16-rc7 |
|
#
cd031354 |
| 17-Jul-2025 |
Jakub Kicinski <kuba@kernel.org> |
Merge branch 'net-mlx5e-add-support-for-pcie-congestion-events'
Tariq Toukan says:
==================== net/mlx5e: Add support for PCIe congestion events
Dragos says:
PCIe congestion events are e
Merge branch 'net-mlx5e-add-support-for-pcie-congestion-events'
Tariq Toukan says:
==================== net/mlx5e: Add support for PCIe congestion events
Dragos says:
PCIe congestion events are events generated by the firmware when the device side has sustained PCIe inbound or outbound traffic above certain thresholds. The high and low threshold are hysteresis thresholds to prevent flapping: once the high threshold has been reached, a low threshold event will be triggered only after the bandwidth usage went below the low threshold.
This series adds support for receiving and exposing such events as ethtool counters.
2 new pairs of counters are exposed: pci_bw_in/outbound_high/low. These should help the user understand if the device PCI is under pressure.
Planned followup patches: - Allow configuration of thresholds through devlink. - Add ethtool counter for wakeups which did not result in any state change. ====================
Link: https://patch.msgid.link/1752589821-145787-1-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
ab2b0d4d |
| 15-Jul-2025 |
Dragos Tatulea <dtatulea@nvidia.com> |
net/mlx5e: Create/destroy PCIe Congestion Event object
Add initial infrastructure to create and destroy the PCIe Congestion Event object if the object is supported.
The verb for the object creation
net/mlx5e: Create/destroy PCIe Congestion Event object
Add initial infrastructure to create and destroy the PCIe Congestion Event object if the object is supported.
The verb for the object creation function is "set" instead of "create" because the function will accommodate the modify operation as well in a subsequent patch.
The next patches will hook it up to the event handler and will add actual functionality.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1752589821-145787-2-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
Revision tags: v6.16-rc6 |
|
#
c65d3429 |
| 10-Jul-2025 |
Jakub Kicinski <kuba@kernel.org> |
Merge branch 'net-mlx5-misc-changes-2025-07-09'
Tariq Toukan says:
==================== net/mlx5: misc changes 2025-07-09
This series contains misc enhancements to the mlx5 driver. ===============
Merge branch 'net-mlx5-misc-changes-2025-07-09'
Tariq Toukan says:
==================== net/mlx5: misc changes 2025-07-09
This series contains misc enhancements to the mlx5 driver. ====================
Link: https://patch.msgid.link/1752009387-13300-1-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
ade89d1f |
| 08-Jul-2025 |
Carolina Jubran <cjubran@nvidia.com> |
net/mlx5e: Remove unused VLAN insertion logic in TX path
The VLAN insertion capability (`wqe_vlan_insert`) was never enabled on all mlx5 devices. When VLAN TX offload is advertised but this capabili
net/mlx5e: Remove unused VLAN insertion logic in TX path
The VLAN insertion capability (`wqe_vlan_insert`) was never enabled on all mlx5 devices. When VLAN TX offload is advertised but this capability is not supported, the driver uses inline headers to insert the VLAN tag.
To support this, the driver used to set the `MLX5E_SQ_STATE_VLAN_NEED_L2_INLINE` bit to enforce L2 inline mode when `wqe_vlan_insert` was not supported. Since the capability is disabled on all devices, this logic was always active, and the SQ flag has become redundant. L2 inline is enforced unconditionally for VLAN-tagged packets.
The `skb_vlan_tag_present()` check in the else-if block of `mlx5e_sq_xmit_wqe()` is never true by this point in the TX flow, as the VLAN tag has already been inserted by the driver using inline headers. As a result, this code is never executed.
Remove the redundant SQ state, dead VLAN insertion code block, and related logic.
Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1752009387-13300-2-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
Revision tags: v6.16-rc5, v6.16-rc4 |
|
#
74f1af95 |
| 29-Jun-2025 |
Rob Clark <robin.clark@oss.qualcomm.com> |
Merge remote-tracking branch 'drm/drm-next' into msm-next
Back-merge drm-next to (indirectly) get arm-smmu updates for making stall-on-fault more reliable.
Signed-off-by: Rob Clark <robin.clark@oss
Merge remote-tracking branch 'drm/drm-next' into msm-next
Back-merge drm-next to (indirectly) get arm-smmu updates for making stall-on-fault more reliable.
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
show more ...
|
Revision tags: v6.16-rc3 |
|
#
8152c402 |
| 18-Jun-2025 |
Jakub Kicinski <kuba@kernel.org> |
Merge branch 'net-mlx5e-add-support-for-devmem-and-io_uring-tcp-zero-copy'
Mark Bloch says:
==================== net/mlx5e: Add support for devmem and io_uring TCP zero-copy
This series adds suppo
Merge branch 'net-mlx5e-add-support-for-devmem-and-io_uring-tcp-zero-copy'
Mark Bloch says:
==================== net/mlx5e: Add support for devmem and io_uring TCP zero-copy
This series adds support for zerocopy rx TCP with devmem and io_uring for ConnectX7 NICs and above. For performance reasons and simplicity HW-GRO will also be turned on when header-data split mode is on.
Performance ===========
Test setup:
* CPU: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (single NUMA) * NIC: ConnectX7 * Benchmarking tool: kperf [0] * Single TCP flow * Test duration: 60s
With application thread and interrupts pinned to the *same* core:
|------+-----------+----------| | MTU | epoll | io_uring | |------+-----------+----------| | 1500 | 61.6 Gbps | 114 Gbps | | 4096 | 69.3 Gbps | 151 Gbps | | 9000 | 67.8 Gbps | 187 Gbps | |------+-----------+----------|
The CPU usage for io_uring is 95%.
Reproduction steps for io_uring:
server --no-daemon -a 2001:db8::1 --no-memcmp --iou --iou_sendzc \ --iou_zcrx --iou_dev_name eth2 --iou_zcrx_queue_id 2
server --no-daemon -a 2001:db8::2 --no-memcmp --iou --iou_sendzc
client --src 2001:db8::2 --dst 2001:db8::1 \ --msg-zerocopy -t 60 --cpu-min=2 --cpu-max=2
Patch overview: ================
First, a netmem API for skb_can_coalesce is added to the core to be able to do skb fragment coalescing on netmems.
The next patches introduce some cleanups in the internal SHAMPO code and improvements to hw gro capability checks in FW.
A separate page_pool is introduced for headers, to be used only when the rxq has a memory provider.
Then the driver is converted to use the netmem API and to allow support for unreadable netmem page pool.
The queue management ops are implemented.
Finally, the tcp-data-split ring parameter is exposed. References ========== [0] kperf: git://git.kernel.dk/kperf.git v1: https://lore.kernel.org/20250116215530.158886-1-saeed@kernel.org v2: https://lore.kernel.org/1747950086-1246773-1-git-send-email-tariqt@nvidia.com v3: https://lore.kernel.org/20250609145833.990793-1-mbloch@nvidia.com v4: https://lore.kernel.org/20250610150950.1094376-1-mbloch@nvidia.com v5: https://lore.kernel.org/20250612154648.1161201-1-mbloch@nvidia.com ====================
Link: https://patch.msgid.link/20250616141441.1243044-1-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
d1668f11 |
| 16-Jun-2025 |
Saeed Mahameed <saeedm@nvidia.com> |
net/mlx5e: Convert over to netmem
mlx5e_page_frag holds the physical page itself, to naturally support zc page pools, remove physical page reference from mlx5 and replace it with netmem_ref, to avoi
net/mlx5e: Convert over to netmem
mlx5e_page_frag holds the physical page itself, to naturally support zc page pools, remove physical page reference from mlx5 and replace it with netmem_ref, to avoid internal handling in mlx5 for net_iov backed pages.
SHAMPO can issue packets that are not split into header and data. These packets will be dropped if the data part resides in a net_iov as the driver can't read into this area.
No performance degradation observed.
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250616141441.1243044-9-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
e225d9bd |
| 16-Jun-2025 |
Saeed Mahameed <saeedm@nvidia.com> |
net/mlx5e: SHAMPO: Separate pool for headers
Allow allocating a separate page pool for headers when SHAMPO is on. This will be useful for adding support to zc page pool, which has to be different fr
net/mlx5e: SHAMPO: Separate pool for headers
Allow allocating a separate page pool for headers when SHAMPO is on. This will be useful for adding support to zc page pool, which has to be different from the headers page pool. For now, the pools are the same.
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250616141441.1243044-8-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
16142def |
| 16-Jun-2025 |
Saeed Mahameed <saeedm@nvidia.com> |
net/mlx5e: SHAMPO: Remove redundant params
Two SHAMPO params are static and always the same, remove them from the global mlx5e_params struct.
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Revie
net/mlx5e: SHAMPO: Remove redundant params
Two SHAMPO params are static and always the same, remove them from the global mlx5e_params struct.
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250616141441.1243044-6-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
af4312c4 |
| 16-Jun-2025 |
Saeed Mahameed <saeedm@nvidia.com> |
net/mlx5e: SHAMPO: Reorganize mlx5_rq_shampo_alloc
Drop redundant SHAMPO structure alloc/free functions.
Gather together function calls pertaining to header split info, pass header per WQE (hd_per_
net/mlx5e: SHAMPO: Reorganize mlx5_rq_shampo_alloc
Drop redundant SHAMPO structure alloc/free functions.
Gather together function calls pertaining to header split info, pass header per WQE (hd_per_wqe) as parameter to those function to avoid use before initialization future mistakes.
Allocate HW GRO related info outside of the header related info scope.
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250616141441.1243044-5-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
Revision tags: v6.16-rc2 |
|
#
c598d5eb |
| 11-Jun-2025 |
Thomas Zimmermann <tzimmermann@suse.de> |
Merge drm/drm-next into drm-misc-next
Backmerging to forward to v6.16-rc1
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
|
#
86e2d052 |
| 09-Jun-2025 |
Thomas Hellström <thomas.hellstrom@linux.intel.com> |
Merge drm/drm-next into drm-xe-next
Backmerging to bring in 6.16
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
#
34c55367 |
| 09-Jun-2025 |
Jani Nikula <jani.nikula@intel.com> |
Merge drm/drm-next into drm-intel-next
Sync to v6.16-rc1, among other things to get the fixed size GENMASK_U*() and BIT_U*() macros.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
Revision tags: v6.16-rc1 |
|
#
4f978603 |
| 02-Jun-2025 |
Dmitry Torokhov <dmitry.torokhov@gmail.com> |
Merge branch 'next' into for-linus
Prepare input updates for 6.16 merge window.
|