xref: /src/share/man/man4/iflib.4 (revision 14d93f612f26f4e8454e393b75b0e4be0fc9d890)
1.Dd January 7, 2026
2.Dt IFLIB 4
3.Os
4.Sh NAME
5.Nm iflib
6.Nd Network Interface Driver Framework
7.Sh SYNOPSIS
8.Cd "device pci"
9.Cd "device iflib"
10.Sh DESCRIPTION
11.Nm
12is a framework for network interface drivers for
13.Fx .
14It is designed to remove a large amount of the boilerplate that is often
15needed for modern network interface devices, allowing driver authors to
16focus on the specific code needed for their hardware.
17This allows for a shared set of
18.Xr sysctl 8
19names, rather than each driver naming them individually.
20.Sh SYSCTL VARIABLES
21These variables must be set before loading the driver, either via
22.Xr loader.conf 5
23or through the use of
24.Xr kenv 1 .
25They are all prefixed by
26.Va dev.X.Y.iflib\&.
27where X is the driver name, and Y is the instance number.
28.Bl -tag -width indent
29.It Va override_nrxds
30Override the number of RX descriptors for each queue.
31The value is a comma separated list of positive integers.
32Some drivers only use a single value, but others may use more.
33These numbers must be powers of two, and zero means to use the default.
34Individual drivers may have additional restrictions on allowable values.
35Defaults to all zeros.
36.It Va override_ntxds
37Override the number of TX descriptors for each queue.
38The value is a comma separated list of positive integers.
39Some drivers only use a single value, but others may use more.
40These numbers must be powers of two, and zero means to use the default.
41Individual drivers may have additional restrictions on allowable values.
42Defaults to all zeros.
43.It Va override_qs_enable
44When set, allows the number of transmit and receive queues to be different.
45If not set, the lower of the number of TX or RX queues will be used for both.
46.It Va override_nrxqs
47Set the number of RX queues.
48If zero, the number of RX queues is derived from the number of cores on the
49socket connected to the controller.
50Defaults to 0.
51.It Va override_ntxqs
52Set the number of TX queues.
53If zero, the number of TX queues is derived from the number of cores on the
54socket connected to the controller.
55.It Va disable_msix
56Disables MSI-X interrupts for the device.
57.It Va core_offset
58Specifies a starting core offset to assign queues to.
59If the value is unspecified or 65535, cores are assigned sequentially across
60controllers.
61.It Va separate_txrx
62Requests that RX and TX queues not be paired on the same core.
63If this is zero or not set, an RX and TX queue pair will be assigned to each
64core.
65When set to a non-zero value, TX queues are assigned to cores following the
66last RX queue.
67.It Va simple_tx
68When set to one, iflib uses a simple transmit routine with no queuing at all.
69By default, iflib uses a highly optimized, lockless, transmit queue called
70mp_ring.
71This performs well when there are more CPU cores than NIC
72queues and prevents lock contention for transmit resources.
73Unfortunately, mp_ring incurs unneeded overheads on workloads where
74resource contention is not a problem (well behaved applications on
75systems where there are as many NIC queues as CPU cores).
76Note that when this is enabled, the tx_abdicate sysctl is no longer
77applicable and is ignored.
78Defaults to zero.
79.El
80.Pp
81These
82.Xr sysctl 8
83variables can be changed at any time:
84.Bl -tag -width indent
85.It Va tx_abdicate
86Controls how the transmit ring is serviced.
87If set to zero, when a frame is submitted to the transmission ring, the same
88task that is submitting it will service the ring unless there's already a
89task servicing the TX ring.
90This ensures that whenever there is a pending transmission,
91the transmit ring is being serviced.
92This results in higher transmit throughput.
93If set to a non-zero value, task returns immediately and the transmit
94ring is serviced by a different task.
95This returns control to the caller faster and under high receive load,
96may result in fewer dropped RX frames.
97.It Va tx_defer_mfree
98Controls the threshold in packets before iflib will free the memory
99(mbufs) for the packets that it has transmitted.
100When this is nonzero, mbufs will be freed outside the transmit lock.
101Setting this can reduce lock contention and CPU use when using simple_tx.
102Note that this applies only when simple_tx is enabled.
103.It Va tx_reclaim_thresh
104Controls the threshold in packets before iflib will ask the driver
105how many transmitted packets can be reclaimed.
106Determining how many many packets can be reclaimed can be expensive
107on some drivers.
108.It Va tx_reclaim_ticks
109Controls the time in ticks before iflib will ask the driver
110how many transmitted packets can be reclaimed.
111Determining how many many packets can be reclaimed can be expensive
112on some drivers.
113.It Va rx_budget
114Sets the maximum number of frames to be received at a time.
115Zero (the default) indicates the default (currently 16) should be used.
116.El
117.Pp
118There are also some global sysctls which can change behaviour for all drivers,
119and may be changed at any time.
120.Bl -tag -width indent
121.It Va net.iflib.min_tx_latency
122If this is set to a non-zero value, iflib will avoid any attempt to combine
123multiple transmits, and notify the hardware as quickly as possible of
124new descriptors.
125This will lower the maximum throughput, but will also lower transmit latency.
126.It Va net.iflib.no_tx_batch
127Some NICs allow processing completed transmit descriptors in batches.
128Doing so usually increases the transmit throughput by reducing the number of
129transmit interrupts.
130Setting this to a non-zero value will disable the use of this feature.
131.El
132.Pp
133These
134.Xr sysctl 8
135variables are read-only:
136.Bl -tag -width indent
137.It Va driver_version
138A string indicating the internal version of the driver.
139.El
140.Pp
141There are a number of queue state
142.Xr sysctl 8
143variables as well:
144.Bl -tag -width indent
145.It Va txqZ
146The following are repeated for each transmit queue, where Z is the transmit
147queue instance number:
148.Bl -tag -width indent
149.It Va r_abdications
150Number of consumer abdications in the MP ring for this queue.
151An abdication occurs on every ring submission when tx_abdicate is true.
152.It Va r_restarts
153Number of consumer restarts in the MP ring for this queue.
154A restart occurs when an attempt to drain a non-empty ring fails,
155and the ring is already in the STALLED state.
156.It Va r_stalls
157Number of consumer stalls in the MP ring for this queue.
158A stall occurs when an attempt to drain a non-empty ring fails.
159.It Va r_starts
160Number of normal consumer starts in the MP ring for this queue.
161A start occurs when the MP ring transitions from IDLE to BUSY.
162.It Va r_drops
163Number of drops in the MP ring for this queue.
164A drop occurs when there is an attempt to add an entry to an MP ring with
165no available space.
166.It Va r_enqueues
167Number of entries which have been enqueued to the MP ring for this queue.
168.It Va ring_state
169MP (soft) ring state.
170This provides a snapshot of the current MP ring state, including the producer
171head and tail indexes, the consumer index, and the state.
172The state is one of "IDLE", "BUSY",
173"STALLED", or "ABDICATED".
174.It Va txq_cleaned
175The number of transmit descriptors which have been reclaimed.
176Total cleaned.
177.It Va txq_processed
178The number of transmit descriptors which have been processed, but may not yet
179have been reclaimed.
180.It Va txq_in_use
181Descriptors which have been added to the transmit queue,
182but have not yet been cleaned.
183This value will include both untransmitted descriptors as well as descriptors
184which have been processed.
185.It Va txq_cidx_processed
186The transmit queue consumer index of the next descriptor to process.
187.It Va txq_cidx
188The transmit queue consumer index of the oldest descriptor to reclaim.
189.It Va txq_pidx
190The transmit queue producer index where the next descriptor to transmit will
191be inserted.
192.It Va no_tx_dma_setup
193Number of times DMA mapping a transmit mbuf failed for reasons other than
194.Er EFBIG .
195.It Va txd_encap_efbig
196Number of times DMA mapping a transmit mbuf failed due to requiring too many
197segments.
198.It Va tx_map_failed
199Number of times DMA mapping a transmit mbuf failed for any reason
200(sum of no_tx_dma_setup and txd_encap_efbig)
201.It Va no_desc_avail
202Number of times a descriptor couldn't be added to the transmit ring because
203the transmit ring was full.
204.It Va mbuf_defrag_failed
205Number of times both
206.Xr m_collapse 9
207and
208.Xr m_defrag 9
209failed after an
210.Er EFBIG
211error
212result from DMA mapping a transmit mbuf.
213.It Va m_pullups
214Number of times
215.Xr m_pullup 9
216was called attempting to parse a header.
217.It Va mbuf_defrag
218Number of times
219.Xr m_defrag 9
220was called.
221.El
222.It Va rxqZ
223The following are repeated for each receive queue, where Z is the
224receive queue instance number:
225.Bl -tag -width indent
226.It Va rxq_fl0.credits
227Credits currently available in the receive ring.
228.It Va rxq_fl0.cidx
229Current receive ring consumer index.
230.It Va rxq_fl0.pidx
231Current receive ring producer index.
232.El
233.El
234.Pp
235Additional OIDs useful for driver and iflib development are exposed when the
236INVARIANTS and/or WITNESS options are enabled in the kernel.
237.Sh SEE ALSO
238.Xr iflib 9
239.Sh HISTORY
240This framework was introduced in
241.Fx 11.0 .
242