xref: /linux/Documentation/networking/xsk-tx-metadata.rst (revision 36ec807b627b4c0a0a382f0ae48eac7187d14b2b)
15c399ae0SStanislav Fomichev.. SPDX-License-Identifier: GPL-2.0
25c399ae0SStanislav Fomichev
39620e956SStanislav Fomichev==================
49620e956SStanislav FomichevAF_XDP TX Metadata
59620e956SStanislav Fomichev==================
69620e956SStanislav Fomichev
79620e956SStanislav FomichevThis document describes how to enable offloads when transmitting packets
89620e956SStanislav Fomichevvia :doc:`af_xdp`. Refer to :doc:`xdp-rx-metadata` on how to access similar
99620e956SStanislav Fomichevmetadata on the receive side.
109620e956SStanislav Fomichev
119620e956SStanislav FomichevGeneral Design
129620e956SStanislav Fomichev==============
139620e956SStanislav Fomichev
14*d5e726d9SStanislav FomichevThe headroom for the metadata is reserved via ``tx_metadata_len`` and
15*d5e726d9SStanislav Fomichev``XDP_UMEM_TX_METADATA_LEN`` flag in ``struct xdp_umem_reg``. The metadata
16*d5e726d9SStanislav Fomichevlength is therefore the same for every socket that shares the same umem.
17*d5e726d9SStanislav FomichevThe metadata layout is a fixed UAPI, refer to ``union xsk_tx_metadata`` in
18*d5e726d9SStanislav Fomichev``include/uapi/linux/if_xdp.h``. Thus, generally, the ``tx_metadata_len``
19*d5e726d9SStanislav Fomichevfield above should contain ``sizeof(union xsk_tx_metadata)``.
20*d5e726d9SStanislav Fomichev
21*d5e726d9SStanislav FomichevNote that in the original implementation the ``XDP_UMEM_TX_METADATA_LEN``
22*d5e726d9SStanislav Fomichevflag was not required. Applications might attempt to create a umem
23*d5e726d9SStanislav Fomichevwith a flag first and if it fails, do another attempt without a flag.
249620e956SStanislav Fomichev
259620e956SStanislav FomichevThe headroom and the metadata itself should be located right before
269620e956SStanislav Fomichev``xdp_desc->addr`` in the umem frame. Within a frame, the metadata
279620e956SStanislav Fomichevlayout is as follows::
289620e956SStanislav Fomichev
299620e956SStanislav Fomichev           tx_metadata_len
309620e956SStanislav Fomichev     /                         \
319620e956SStanislav Fomichev    +-----------------+---------+----------------------------+
329620e956SStanislav Fomichev    | xsk_tx_metadata | padding |          payload           |
339620e956SStanislav Fomichev    +-----------------+---------+----------------------------+
349620e956SStanislav Fomichev                                ^
359620e956SStanislav Fomichev                                |
369620e956SStanislav Fomichev                          xdp_desc->addr
379620e956SStanislav Fomichev
389620e956SStanislav FomichevAn AF_XDP application can request headrooms larger than ``sizeof(struct
399620e956SStanislav Fomichevxsk_tx_metadata)``. The kernel will ignore the padding (and will still
409620e956SStanislav Fomichevuse ``xdp_desc->addr - tx_metadata_len`` to locate
419620e956SStanislav Fomichevthe ``xsk_tx_metadata``). For the frames that shouldn't carry
429620e956SStanislav Fomichevany metadata (i.e., the ones that don't have ``XDP_TX_METADATA`` option),
439620e956SStanislav Fomichevthe metadata area is ignored by the kernel as well.
449620e956SStanislav Fomichev
459620e956SStanislav FomichevThe flags field enables the particular offload:
469620e956SStanislav Fomichev
479620e956SStanislav Fomichev- ``XDP_TXMD_FLAGS_TIMESTAMP``: requests the device to put transmission
489620e956SStanislav Fomichev  timestamp into ``tx_timestamp`` field of ``union xsk_tx_metadata``.
499620e956SStanislav Fomichev- ``XDP_TXMD_FLAGS_CHECKSUM``: requests the device to calculate L4
509620e956SStanislav Fomichev  checksum. ``csum_start`` specifies byte offset of where the checksumming
519620e956SStanislav Fomichev  should start and ``csum_offset`` specifies byte offset where the
529620e956SStanislav Fomichev  device should store the computed checksum.
539620e956SStanislav Fomichev- ``XDP_TXMD_FLAGS_LAUNCH_TIME``: requests the device to schedule the
549620e956SStanislav Fomichev  packet for transmission at a pre-determined time called launch time. The
559620e956SStanislav Fomichev  value of launch time is indicated by ``launch_time`` field of
569620e956SStanislav Fomichev  ``union xsk_tx_metadata``.
579620e956SStanislav Fomichev
589620e956SStanislav FomichevBesides the flags above, in order to trigger the offloads, the first
5911614723SStanislav Fomichevpacket's ``struct xdp_desc`` descriptor should set ``XDP_TX_METADATA``
6011614723SStanislav Fomichevbit in the ``options`` field. Also note that in a multi-buffer packet
6111614723SStanislav Fomichevonly the first chunk should carry the metadata.
6211614723SStanislav Fomichev
6311614723SStanislav FomichevSoftware TX Checksum
6411614723SStanislav Fomichev====================
6511614723SStanislav Fomichev
6611614723SStanislav FomichevFor development and testing purposes its possible to pass
6711614723SStanislav Fomichev``XDP_UMEM_TX_SW_CSUM`` flag to ``XDP_UMEM_REG`` UMEM registration call.
689620e956SStanislav FomichevIn this case, when running in ``XDK_COPY`` mode, the TX checksum
699620e956SStanislav Fomichevis calculated on the CPU. Do not enable this option in production because
709620e956SStanislav Fomichevit will negatively affect performance.
719620e956SStanislav Fomichev
729620e956SStanislav FomichevLaunch Time
739620e956SStanislav Fomichev===========
749620e956SStanislav Fomichev
759620e956SStanislav FomichevThe value of the requested launch time should be based on the device's PTP
769620e956SStanislav FomichevHardware Clock (PHC) to ensure accuracy. AF_XDP takes a different data path
779620e956SStanislav Fomichevcompared to the ETF queuing discipline, which organizes packets and delays
789620e956SStanislav Fomichevtheir transmission. Instead, AF_XDP immediately hands off the packets to
799620e956SStanislav Fomichevthe device driver without rearranging their order or holding them prior to
809620e956SStanislav Fomichevtransmission. Since the driver maintains FIFO behavior and does not perform
819620e956SStanislav Fomichevpacket reordering, a packet with a launch time request will block other
829620e956SStanislav Fomichevpackets in the same Tx Queue until it is sent. Therefore, it is recommended
839620e956SStanislav Fomichevto allocate separate queue for scheduling traffic that is intended for
849620e956SStanislav Fomichevfuture transmission.
859620e956SStanislav Fomichev
86In scenarios where the launch time offload feature is disabled, the device
87driver is expected to disregard the launch time request. For correct
88interpretation and meaningful operation, the launch time should never be
89set to a value larger than the farthest programmable time in the future
90(the horizon). Different devices have different hardware limitations on the
91launch time offload feature.
92
93stmmac driver
94-------------
95
96For stmmac, TSO and launch time (TBS) features are mutually exclusive for
97each individual Tx Queue. By default, the driver configures Tx Queue 0 to
98support TSO and the rest of the Tx Queues to support TBS. The launch time
99hardware offload feature can be enabled or disabled by using the tc-etf
100command to call the driver's ndo_setup_tc() callback.
101
102The value of the launch time that is programmed in the Enhanced Normal
103Transmit Descriptors is a 32-bit value, where the most significant 8 bits
104represent the time in seconds and the remaining 24 bits represent the time
105in 256 ns increments. The programmed launch time is compared against the
106PTP time (bits[39:8]) and rolls over after 256 seconds. Therefore, the
107horizon of the launch time for dwmac4 and dwxlgmac2 is 128 seconds in the
108future.
109
110igc driver
111----------
112
113For igc, all four Tx Queues support the launch time feature. The launch
114time hardware offload feature can be enabled or disabled by using the
115tc-etf command to call the driver's ndo_setup_tc() callback. When entering
116TSN mode, the igc driver will reset the device and create a default Qbv
117schedule with a 1-second cycle time, with all Tx Queues open at all times.
118
119The value of the launch time that is programmed in the Advanced Transmit
120Context Descriptor is a relative offset to the starting time of the Qbv
121transmission window of the queue. The Frst flag of the descriptor can be
122set to schedule the packet for the next Qbv cycle. Therefore, the horizon
123of the launch time for i225 and i226 is the ending time of the next cycle
124of the Qbv transmission window of the queue. For example, when the Qbv
125cycle time is set to 1 second, the horizon of the launch time ranges
126from 1 second to 2 seconds, depending on where the Qbv cycle is currently
127running.
128
129Querying Device Capabilities
130============================
131
132Every devices exports its offloads capabilities via netlink netdev family.
133Refer to ``xsk-flags`` features bitmask in
134``Documentation/netlink/specs/netdev.yaml``.
135
136- ``tx-timestamp``: device supports ``XDP_TXMD_FLAGS_TIMESTAMP``
137- ``tx-checksum``: device supports ``XDP_TXMD_FLAGS_CHECKSUM``
138- ``tx-launch-time-fifo``: device supports ``XDP_TXMD_FLAGS_LAUNCH_TIME``
139
140See ``tools/net/ynl/samples/netdev.c`` on how to query this information.
141
142Example
143=======
144
145See ``tools/testing/selftests/bpf/xdp_hw_metadata.c`` for an example
146program that handles TX metadata. Also see https://github.com/fomichev/xskgen
147for a more bare-bones example.
148