15c399ae0SStanislav Fomichev.. SPDX-License-Identifier: GPL-2.0 25c399ae0SStanislav Fomichev 39620e956SStanislav Fomichev================== 49620e956SStanislav FomichevAF_XDP TX Metadata 59620e956SStanislav Fomichev================== 69620e956SStanislav Fomichev 79620e956SStanislav FomichevThis document describes how to enable offloads when transmitting packets 89620e956SStanislav Fomichevvia :doc:`af_xdp`. Refer to :doc:`xdp-rx-metadata` on how to access similar 99620e956SStanislav Fomichevmetadata on the receive side. 109620e956SStanislav Fomichev 119620e956SStanislav FomichevGeneral Design 129620e956SStanislav Fomichev============== 139620e956SStanislav Fomichev 14*d5e726d9SStanislav FomichevThe headroom for the metadata is reserved via ``tx_metadata_len`` and 15*d5e726d9SStanislav Fomichev``XDP_UMEM_TX_METADATA_LEN`` flag in ``struct xdp_umem_reg``. The metadata 16*d5e726d9SStanislav Fomichevlength is therefore the same for every socket that shares the same umem. 17*d5e726d9SStanislav FomichevThe metadata layout is a fixed UAPI, refer to ``union xsk_tx_metadata`` in 18*d5e726d9SStanislav Fomichev``include/uapi/linux/if_xdp.h``. Thus, generally, the ``tx_metadata_len`` 19*d5e726d9SStanislav Fomichevfield above should contain ``sizeof(union xsk_tx_metadata)``. 20*d5e726d9SStanislav Fomichev 21*d5e726d9SStanislav FomichevNote that in the original implementation the ``XDP_UMEM_TX_METADATA_LEN`` 22*d5e726d9SStanislav Fomichevflag was not required. Applications might attempt to create a umem 23*d5e726d9SStanislav Fomichevwith a flag first and if it fails, do another attempt without a flag. 249620e956SStanislav Fomichev 259620e956SStanislav FomichevThe headroom and the metadata itself should be located right before 269620e956SStanislav Fomichev``xdp_desc->addr`` in the umem frame. Within a frame, the metadata 279620e956SStanislav Fomichevlayout is as follows:: 289620e956SStanislav Fomichev 299620e956SStanislav Fomichev tx_metadata_len 309620e956SStanislav Fomichev / \ 319620e956SStanislav Fomichev +-----------------+---------+----------------------------+ 329620e956SStanislav Fomichev | xsk_tx_metadata | padding | payload | 339620e956SStanislav Fomichev +-----------------+---------+----------------------------+ 349620e956SStanislav Fomichev ^ 359620e956SStanislav Fomichev | 369620e956SStanislav Fomichev xdp_desc->addr 379620e956SStanislav Fomichev 389620e956SStanislav FomichevAn AF_XDP application can request headrooms larger than ``sizeof(struct 399620e956SStanislav Fomichevxsk_tx_metadata)``. The kernel will ignore the padding (and will still 409620e956SStanislav Fomichevuse ``xdp_desc->addr - tx_metadata_len`` to locate 419620e956SStanislav Fomichevthe ``xsk_tx_metadata``). For the frames that shouldn't carry 429620e956SStanislav Fomichevany metadata (i.e., the ones that don't have ``XDP_TX_METADATA`` option), 439620e956SStanislav Fomichevthe metadata area is ignored by the kernel as well. 449620e956SStanislav Fomichev 459620e956SStanislav FomichevThe flags field enables the particular offload: 469620e956SStanislav Fomichev 479620e956SStanislav Fomichev- ``XDP_TXMD_FLAGS_TIMESTAMP``: requests the device to put transmission 489620e956SStanislav Fomichev timestamp into ``tx_timestamp`` field of ``union xsk_tx_metadata``. 499620e956SStanislav Fomichev- ``XDP_TXMD_FLAGS_CHECKSUM``: requests the device to calculate L4 509620e956SStanislav Fomichev checksum. ``csum_start`` specifies byte offset of where the checksumming 519620e956SStanislav Fomichev should start and ``csum_offset`` specifies byte offset where the 529620e956SStanislav Fomichev device should store the computed checksum. 539620e956SStanislav Fomichev- ``XDP_TXMD_FLAGS_LAUNCH_TIME``: requests the device to schedule the 549620e956SStanislav Fomichev packet for transmission at a pre-determined time called launch time. The 559620e956SStanislav Fomichev value of launch time is indicated by ``launch_time`` field of 569620e956SStanislav Fomichev ``union xsk_tx_metadata``. 579620e956SStanislav Fomichev 589620e956SStanislav FomichevBesides the flags above, in order to trigger the offloads, the first 5911614723SStanislav Fomichevpacket's ``struct xdp_desc`` descriptor should set ``XDP_TX_METADATA`` 6011614723SStanislav Fomichevbit in the ``options`` field. Also note that in a multi-buffer packet 6111614723SStanislav Fomichevonly the first chunk should carry the metadata. 6211614723SStanislav Fomichev 6311614723SStanislav FomichevSoftware TX Checksum 6411614723SStanislav Fomichev==================== 6511614723SStanislav Fomichev 6611614723SStanislav FomichevFor development and testing purposes its possible to pass 6711614723SStanislav Fomichev``XDP_UMEM_TX_SW_CSUM`` flag to ``XDP_UMEM_REG`` UMEM registration call. 689620e956SStanislav FomichevIn this case, when running in ``XDK_COPY`` mode, the TX checksum 699620e956SStanislav Fomichevis calculated on the CPU. Do not enable this option in production because 709620e956SStanislav Fomichevit will negatively affect performance. 719620e956SStanislav Fomichev 729620e956SStanislav FomichevLaunch Time 739620e956SStanislav Fomichev=========== 749620e956SStanislav Fomichev 759620e956SStanislav FomichevThe value of the requested launch time should be based on the device's PTP 769620e956SStanislav FomichevHardware Clock (PHC) to ensure accuracy. AF_XDP takes a different data path 779620e956SStanislav Fomichevcompared to the ETF queuing discipline, which organizes packets and delays 789620e956SStanislav Fomichevtheir transmission. Instead, AF_XDP immediately hands off the packets to 799620e956SStanislav Fomichevthe device driver without rearranging their order or holding them prior to 809620e956SStanislav Fomichevtransmission. Since the driver maintains FIFO behavior and does not perform 819620e956SStanislav Fomichevpacket reordering, a packet with a launch time request will block other 829620e956SStanislav Fomichevpackets in the same Tx Queue until it is sent. Therefore, it is recommended 839620e956SStanislav Fomichevto allocate separate queue for scheduling traffic that is intended for 849620e956SStanislav Fomichevfuture transmission. 859620e956SStanislav Fomichev 86In scenarios where the launch time offload feature is disabled, the device 87driver is expected to disregard the launch time request. For correct 88interpretation and meaningful operation, the launch time should never be 89set to a value larger than the farthest programmable time in the future 90(the horizon). Different devices have different hardware limitations on the 91launch time offload feature. 92 93stmmac driver 94------------- 95 96For stmmac, TSO and launch time (TBS) features are mutually exclusive for 97each individual Tx Queue. By default, the driver configures Tx Queue 0 to 98support TSO and the rest of the Tx Queues to support TBS. The launch time 99hardware offload feature can be enabled or disabled by using the tc-etf 100command to call the driver's ndo_setup_tc() callback. 101 102The value of the launch time that is programmed in the Enhanced Normal 103Transmit Descriptors is a 32-bit value, where the most significant 8 bits 104represent the time in seconds and the remaining 24 bits represent the time 105in 256 ns increments. The programmed launch time is compared against the 106PTP time (bits[39:8]) and rolls over after 256 seconds. Therefore, the 107horizon of the launch time for dwmac4 and dwxlgmac2 is 128 seconds in the 108future. 109 110igc driver 111---------- 112 113For igc, all four Tx Queues support the launch time feature. The launch 114time hardware offload feature can be enabled or disabled by using the 115tc-etf command to call the driver's ndo_setup_tc() callback. When entering 116TSN mode, the igc driver will reset the device and create a default Qbv 117schedule with a 1-second cycle time, with all Tx Queues open at all times. 118 119The value of the launch time that is programmed in the Advanced Transmit 120Context Descriptor is a relative offset to the starting time of the Qbv 121transmission window of the queue. The Frst flag of the descriptor can be 122set to schedule the packet for the next Qbv cycle. Therefore, the horizon 123of the launch time for i225 and i226 is the ending time of the next cycle 124of the Qbv transmission window of the queue. For example, when the Qbv 125cycle time is set to 1 second, the horizon of the launch time ranges 126from 1 second to 2 seconds, depending on where the Qbv cycle is currently 127running. 128 129Querying Device Capabilities 130============================ 131 132Every devices exports its offloads capabilities via netlink netdev family. 133Refer to ``xsk-flags`` features bitmask in 134``Documentation/netlink/specs/netdev.yaml``. 135 136- ``tx-timestamp``: device supports ``XDP_TXMD_FLAGS_TIMESTAMP`` 137- ``tx-checksum``: device supports ``XDP_TXMD_FLAGS_CHECKSUM`` 138- ``tx-launch-time-fifo``: device supports ``XDP_TXMD_FLAGS_LAUNCH_TIME`` 139 140See ``tools/net/ynl/samples/netdev.c`` on how to query this information. 141 142Example 143======= 144 145See ``tools/testing/selftests/bpf/xdp_hw_metadata.c`` for an example 146program that handles TX metadata. Also see https://github.com/fomichev/xskgen 147for a more bare-bones example. 148