11b23f5e9SOtto Sabart.. SPDX-License-Identifier: GPL-2.0 21b23f5e9SOtto Sabart 3b83eb68cSOtto Sabart===================== 4b83eb68cSOtto SabartSegmentation Offloads 5b83eb68cSOtto Sabart===================== 61b23f5e9SOtto Sabart 7f7a6272bSAlexander Duyck 8f7a6272bSAlexander DuyckIntroduction 9f7a6272bSAlexander Duyck============ 10f7a6272bSAlexander Duyck 11f7a6272bSAlexander DuyckThis document describes a set of techniques in the Linux networking stack 12f7a6272bSAlexander Duyckto take advantage of segmentation offload capabilities of various NICs. 13f7a6272bSAlexander Duyck 14f7a6272bSAlexander DuyckThe following technologies are described: 15f7a6272bSAlexander Duyck * TCP Segmentation Offload - TSO 16f7a6272bSAlexander Duyck * UDP Fragmentation Offload - UFO 17f7a6272bSAlexander Duyck * IPIP, SIT, GRE, and UDP Tunnel Offloads 18f7a6272bSAlexander Duyck * Generic Segmentation Offload - GSO 19f7a6272bSAlexander Duyck * Generic Receive Offload - GRO 20f7a6272bSAlexander Duyck * Partial Generic Segmentation Offload - GSO_PARTIAL 21*ba3c4385SWeitao Hou * SCTP acceleration with GSO - GSO_BY_FRAGS 22f7a6272bSAlexander Duyck 231b23f5e9SOtto Sabart 24f7a6272bSAlexander DuyckTCP Segmentation Offload 25f7a6272bSAlexander Duyck======================== 26f7a6272bSAlexander Duyck 27f7a6272bSAlexander DuyckTCP segmentation allows a device to segment a single frame into multiple 28f7a6272bSAlexander Duyckframes with a data payload size specified in skb_shinfo()->gso_size. 293d07e074SDaniel AxtensWhen TCP segmentation requested the bit for either SKB_GSO_TCPV4 or 303d07e074SDaniel AxtensSKB_GSO_TCPV6 should be set in skb_shinfo()->gso_type and 31f7a6272bSAlexander Duyckskb_shinfo()->gso_size should be set to a non-zero value. 32f7a6272bSAlexander Duyck 33f7a6272bSAlexander DuyckTCP segmentation is dependent on support for the use of partial checksum 34f7a6272bSAlexander Duyckoffload. For this reason TSO is normally disabled if the Tx checksum 35f7a6272bSAlexander Duyckoffload for a given device is disabled. 36f7a6272bSAlexander Duyck 37f7a6272bSAlexander DuyckIn order to support TCP segmentation offload it is necessary to populate 38f7a6272bSAlexander Duyckthe network and transport header offsets of the skbuff so that the device 39f7a6272bSAlexander Duyckdrivers will be able determine the offsets of the IP or IPv6 header and the 40f7a6272bSAlexander DuyckTCP header. In addition as CHECKSUM_PARTIAL is required csum_start should 41f7a6272bSAlexander Duyckalso point to the TCP header of the packet. 42f7a6272bSAlexander Duyck 43f7a6272bSAlexander DuyckFor IPv4 segmentation we support one of two types in terms of the IP ID. 44f7a6272bSAlexander DuyckThe default behavior is to increment the IP ID with every segment. If the 45f7a6272bSAlexander DuyckGSO type SKB_GSO_TCP_FIXEDID is specified then we will not increment the IP 46f7a6272bSAlexander DuyckID and all segments will use the same IP ID. If a device has 47f7a6272bSAlexander DuyckNETIF_F_TSO_MANGLEID set then the IP ID can be ignored when performing TSO 48f7a6272bSAlexander Duyckand we will either increment the IP ID for all frames, or leave it at a 49f7a6272bSAlexander Duyckstatic value based on driver preference. 50f7a6272bSAlexander Duyck 511b23f5e9SOtto Sabart 52f7a6272bSAlexander DuyckUDP Fragmentation Offload 53f7a6272bSAlexander Duyck========================= 54f7a6272bSAlexander Duyck 55f7a6272bSAlexander DuyckUDP fragmentation offload allows a device to fragment an oversized UDP 56f7a6272bSAlexander Duyckdatagram into multiple IPv4 fragments. Many of the requirements for UDP 57f7a6272bSAlexander Duyckfragmentation offload are the same as TSO. However the IPv4 ID for 58f7a6272bSAlexander Duyckfragments should not increment as a single IPv4 datagram is fragmented. 59f7a6272bSAlexander Duyck 60a65820e6SDaniel AxtensUFO is deprecated: modern kernels will no longer generate UFO skbs, but can 61a65820e6SDaniel Axtensstill receive them from tuntap and similar devices. Offload of UDP-based 62a65820e6SDaniel Axtenstunnel protocols is still supported. 63a65820e6SDaniel Axtens 641b23f5e9SOtto Sabart 65f7a6272bSAlexander DuyckIPIP, SIT, GRE, UDP Tunnel, and Remote Checksum Offloads 66f7a6272bSAlexander Duyck======================================================== 67f7a6272bSAlexander Duyck 68f7a6272bSAlexander DuyckIn addition to the offloads described above it is possible for a frame to 69f7a6272bSAlexander Duyckcontain additional headers such as an outer tunnel. In order to account 70f7a6272bSAlexander Duyckfor such instances an additional set of segmentation offload types were 7111bafd54SNicolas Dichtelintroduced including SKB_GSO_IPXIP4, SKB_GSO_IPXIP6, SKB_GSO_GRE, and 72f7a6272bSAlexander DuyckSKB_GSO_UDP_TUNNEL. These extra segmentation types are used to identify 73f7a6272bSAlexander Duyckcases where there are more than just 1 set of headers. For example in the 74f7a6272bSAlexander Duyckcase of IPIP and SIT we should have the network and transport headers moved 75f7a6272bSAlexander Duyckfrom the standard list of headers to "inner" header offsets. 76f7a6272bSAlexander Duyck 77f7a6272bSAlexander DuyckCurrently only two levels of headers are supported. The convention is to 78f7a6272bSAlexander Duyckrefer to the tunnel headers as the outer headers, while the encapsulated 79f7a6272bSAlexander Duyckdata is normally referred to as the inner headers. Below is the list of 80f7a6272bSAlexander Duyckcalls to access the given headers: 81f7a6272bSAlexander Duyck 821b23f5e9SOtto SabartIPIP/SIT Tunnel:: 831b23f5e9SOtto Sabart 84f7a6272bSAlexander Duyck Outer Inner 85f7a6272bSAlexander Duyck MAC skb_mac_header 86f7a6272bSAlexander Duyck Network skb_network_header skb_inner_network_header 87f7a6272bSAlexander Duyck Transport skb_transport_header 88f7a6272bSAlexander Duyck 891b23f5e9SOtto SabartUDP/GRE Tunnel:: 901b23f5e9SOtto Sabart 91f7a6272bSAlexander Duyck Outer Inner 92f7a6272bSAlexander Duyck MAC skb_mac_header skb_inner_mac_header 93f7a6272bSAlexander Duyck Network skb_network_header skb_inner_network_header 94f7a6272bSAlexander Duyck Transport skb_transport_header skb_inner_transport_header 95f7a6272bSAlexander Duyck 96f7a6272bSAlexander DuyckIn addition to the above tunnel types there are also SKB_GSO_GRE_CSUM and 97f7a6272bSAlexander DuyckSKB_GSO_UDP_TUNNEL_CSUM. These two additional tunnel types reflect the 98f7a6272bSAlexander Duyckfact that the outer header also requests to have a non-zero checksum 99f7a6272bSAlexander Duyckincluded in the outer header. 100f7a6272bSAlexander Duyck 101bc3c2431SDaniel AxtensFinally there is SKB_GSO_TUNNEL_REMCSUM which indicates that a given tunnel 102bc3c2431SDaniel Axtensheader has requested a remote checksum offload. In this case the inner 103bc3c2431SDaniel Axtensheaders will be left with a partial checksum and only the outer header 104bc3c2431SDaniel Axtenschecksum will be computed. 105f7a6272bSAlexander Duyck 1061b23f5e9SOtto Sabart 107f7a6272bSAlexander DuyckGeneric Segmentation Offload 108f7a6272bSAlexander Duyck============================ 109f7a6272bSAlexander Duyck 110f7a6272bSAlexander DuyckGeneric segmentation offload is a pure software offload that is meant to 111f7a6272bSAlexander Duyckdeal with cases where device drivers cannot perform the offloads described 112f7a6272bSAlexander Duyckabove. What occurs in GSO is that a given skbuff will have its data broken 113f7a6272bSAlexander Duyckout over multiple skbuffs that have been resized to match the MSS provided 114f7a6272bSAlexander Duyckvia skb_shinfo()->gso_size. 115f7a6272bSAlexander Duyck 116f7a6272bSAlexander DuyckBefore enabling any hardware segmentation offload a corresponding software 117f7a6272bSAlexander Duyckoffload is required in GSO. Otherwise it becomes possible for a frame to 118f7a6272bSAlexander Duyckbe re-routed between devices and end up being unable to be transmitted. 119f7a6272bSAlexander Duyck 1201b23f5e9SOtto Sabart 121f7a6272bSAlexander DuyckGeneric Receive Offload 122f7a6272bSAlexander Duyck======================= 123f7a6272bSAlexander Duyck 124f7a6272bSAlexander DuyckGeneric receive offload is the complement to GSO. Ideally any frame 125f7a6272bSAlexander Duyckassembled by GRO should be segmented to create an identical sequence of 126f7a6272bSAlexander Duyckframes using GSO, and any sequence of frames segmented by GSO should be 127f7a6272bSAlexander Duyckable to be reassembled back to the original by GRO. The only exception to 128f7a6272bSAlexander Duyckthis is IPv4 ID in the case that the DF bit is set for a given IP header. 129f7a6272bSAlexander DuyckIf the value of the IPv4 ID is not sequentially incrementing it will be 130f7a6272bSAlexander Duyckaltered so that it is when a frame assembled via GRO is segmented via GSO. 131f7a6272bSAlexander Duyck 1321b23f5e9SOtto Sabart 133f7a6272bSAlexander DuyckPartial Generic Segmentation Offload 134f7a6272bSAlexander Duyck==================================== 135f7a6272bSAlexander Duyck 136f7a6272bSAlexander DuyckPartial generic segmentation offload is a hybrid between TSO and GSO. What 137f7a6272bSAlexander Duyckit effectively does is take advantage of certain traits of TCP and tunnels 138f7a6272bSAlexander Duyckso that instead of having to rewrite the packet headers for each segment 139f7a6272bSAlexander Duyckonly the inner-most transport header and possibly the outer-most network 140f7a6272bSAlexander Duyckheader need to be updated. This allows devices that do not support tunnel 141f7a6272bSAlexander Duyckoffloads or tunnel offloads with checksum to still make use of segmentation. 142f7a6272bSAlexander Duyck 143f7a6272bSAlexander DuyckWith the partial offload what occurs is that all headers excluding the 144f7a6272bSAlexander Duyckinner transport header are updated such that they will contain the correct 145f7a6272bSAlexander Duyckvalues for if the header was simply duplicated. The one exception to this 146f7a6272bSAlexander Duyckis the outer IPv4 ID field. It is up to the device drivers to guarantee 147f7a6272bSAlexander Duyckthat the IPv4 ID field is incremented in the case that a given header does 148f7a6272bSAlexander Duycknot have the DF bit set. 149a6770889SDaniel Axtens 1501b23f5e9SOtto Sabart 151*ba3c4385SWeitao HouSCTP acceleration with GSO 152a6770889SDaniel Axtens=========================== 153a6770889SDaniel Axtens 154a6770889SDaniel AxtensSCTP - despite the lack of hardware support - can still take advantage of 155a6770889SDaniel AxtensGSO to pass one large packet through the network stack, rather than 156a6770889SDaniel Axtensmultiple small packets. 157a6770889SDaniel Axtens 158a6770889SDaniel AxtensThis requires a different approach to other offloads, as SCTP packets 159a6770889SDaniel Axtenscannot be just segmented to (P)MTU. Rather, the chunks must be contained in 160a6770889SDaniel AxtensIP segments, padding respected. So unlike regular GSO, SCTP can't just 161a6770889SDaniel Axtensgenerate a big skb, set gso_size to the fragmentation point and deliver it 162a6770889SDaniel Axtensto IP layer. 163a6770889SDaniel Axtens 164a6770889SDaniel AxtensInstead, the SCTP protocol layer builds an skb with the segments correctly 165a6770889SDaniel Axtenspadded and stored as chained skbs, and skb_segment() splits based on those. 166a6770889SDaniel AxtensTo signal this, gso_size is set to the special value GSO_BY_FRAGS. 167a6770889SDaniel Axtens 168a6770889SDaniel AxtensTherefore, any code in the core networking stack must be aware of the 169a6770889SDaniel Axtenspossibility that gso_size will be GSO_BY_FRAGS and handle that case 170d02f51cbSDaniel Axtensappropriately. 171d02f51cbSDaniel Axtens 1721dd27cdeSDaniel AxtensThere are some helpers to make this easier: 1731dd27cdeSDaniel Axtens 1741dd27cdeSDaniel Axtens- skb_is_gso(skb) && skb_is_gso_sctp(skb) is the best way to see if 1751dd27cdeSDaniel Axtens an skb is an SCTP GSO skb. 176d02f51cbSDaniel Axtens 177d02f51cbSDaniel Axtens- For size checks, the skb_gso_validate_*_len family of helpers correctly 178d02f51cbSDaniel Axtens considers GSO_BY_FRAGS. 179d02f51cbSDaniel Axtens 180d02f51cbSDaniel Axtens- For manipulating packets, skb_increase_gso_size and skb_decrease_gso_size 181d02f51cbSDaniel Axtens will check for GSO_BY_FRAGS and WARN if asked to manipulate these skbs. 182a6770889SDaniel Axtens 183a6770889SDaniel AxtensThis also affects drivers with the NETIF_F_FRAGLIST & NETIF_F_GSO_SCTP bits 184a6770889SDaniel Axtensset. Note also that NETIF_F_GSO_SCTP is included in NETIF_F_GSO_SOFTWARE. 185