1 2Network Devices, the Kernel, and You! 3 4 5Introduction 6============ 7The following is a random collection of documentation regarding 8network devices. 9 10struct net_device allocation rules 11================================== 12Network device structures need to persist even after module is unloaded and 13must be allocated with kmalloc. If device has registered successfully, 14it will be freed on last use by free_netdev. This is required to handle the 15pathologic case cleanly (example: rmmod mydriver </sys/class/net/myeth/mtu ) 16 17There are routines in net_init.c to handle the common cases of 18alloc_etherdev, alloc_netdev. These reserve extra space for driver 19private data which gets freed when the network device is freed. If 20separately allocated data is attached to the network device 21(netdev_priv(dev)) then it is up to the module exit handler to free that. 22 23MTU 24=== 25Each network device has a Maximum Transfer Unit. The MTU does not 26include any link layer protocol overhead. Upper layer protocols must 27not pass a socket buffer (skb) to a device to transmit with more data 28than the mtu. The MTU does not include link layer header overhead, so 29for example on Ethernet if the standard MTU is 1500 bytes used, the 30actual skb will contain up to 1514 bytes because of the Ethernet 31header. Devices should allow for the 4 byte VLAN header as well. 32 33Segmentation Offload (GSO, TSO) is an exception to this rule. The 34upper layer protocol may pass a large socket buffer to the device 35transmit routine, and the device will break that up into separate 36packets based on the current MTU. 37 38MTU is symmetrical and applies both to receive and transmit. A device 39must be able to receive at least the maximum size packet allowed by 40the MTU. A network device may use the MTU as mechanism to size receive 41buffers, but the device should allow packets with VLAN header. With 42standard Ethernet mtu of 1500 bytes, the device should allow up to 431518 byte packets (1500 + 14 header + 4 tag). The device may either: 44drop, truncate, or pass up oversize packets, but dropping oversize 45packets is preferred. 46 47 48struct net_device synchronization rules 49======================================= 50dev->open: 51 Synchronization: rtnl_lock() semaphore. 52 Context: process 53 54dev->stop: 55 Synchronization: rtnl_lock() semaphore. 56 Context: process 57 Note1: netif_running() is guaranteed false 58 Note2: dev->poll() is guaranteed to be stopped 59 60dev->do_ioctl: 61 Synchronization: rtnl_lock() semaphore. 62 Context: process 63 64dev->get_stats: 65 Synchronization: dev_base_lock rwlock. 66 Context: nominally process, but don't sleep inside an rwlock 67 68dev->hard_start_xmit: 69 Synchronization: netif_tx_lock spinlock. 70 71 When the driver sets NETIF_F_LLTX in dev->features this will be 72 called without holding netif_tx_lock. In this case the driver 73 has to lock by itself when needed. It is recommended to use a try lock 74 for this and return NETDEV_TX_LOCKED when the spin lock fails. 75 The locking there should also properly protect against 76 set_rx_mode. Note that the use of NETIF_F_LLTX is deprecated. 77 Don't use it for new drivers. 78 79 Context: Process with BHs disabled or BH (timer), 80 will be called with interrupts disabled by netconsole. 81 82 Return codes: 83 o NETDEV_TX_OK everything ok. 84 o NETDEV_TX_BUSY Cannot transmit packet, try later 85 Usually a bug, means queue start/stop flow control is broken in 86 the driver. Note: the driver must NOT put the skb in its DMA ring. 87 o NETDEV_TX_LOCKED Locking failed, please retry quickly. 88 Only valid when NETIF_F_LLTX is set. 89 90dev->tx_timeout: 91 Synchronization: netif_tx_lock spinlock. 92 Context: BHs disabled 93 Notes: netif_queue_stopped() is guaranteed true 94 95dev->set_rx_mode: 96 Synchronization: netif_tx_lock spinlock. 97 Context: BHs disabled 98 99struct napi_struct synchronization rules 100======================================== 101napi->poll: 102 Synchronization: NAPI_STATE_SCHED bit in napi->state. Device 103 driver's dev->close method will invoke napi_disable() on 104 all NAPI instances which will do a sleeping poll on the 105 NAPI_STATE_SCHED napi->state bit, waiting for all pending 106 NAPI activity to cease. 107 Context: softirq 108 will be called with interrupts disabled by netconsole. 109