1IP OVER INFINIBAND 2 3 The ib_ipoib driver is an implementation of the IP over InfiniBand 4 protocol as specified by RFC 4391 and 4392, issued by the IETF ipoib 5 working group. It is a "native" implementation in the sense of 6 setting the interface type to ARPHRD_INFINIBAND and the hardware 7 address length to 20 (earlier proprietary implementations 8 masqueraded to the kernel as ethernet interfaces). 9 10Partitions and P_Keys 11 12 When the IPoIB driver is loaded, it creates one interface for each 13 port using the P_Key at index 0. To create an interface with a 14 different P_Key, write the desired P_Key into the main interface's 15 /sys/class/net/<intf name>/create_child file. For example: 16 17 echo 0x8001 > /sys/class/net/ib0/create_child 18 19 This will create an interface named ib0.8001 with P_Key 0x8001. To 20 remove a subinterface, use the "delete_child" file: 21 22 echo 0x8001 > /sys/class/net/ib0/delete_child 23 24 The P_Key for any interface is given by the "pkey" file, and the 25 main interface for a subinterface is in "parent." 26 27Datagram vs Connected modes 28 29 The IPoIB driver supports two modes of operation: datagram and 30 connected. The mode is set and read through an interface's 31 /sys/class/net/<intf name>/mode file. 32 33 In datagram mode, the IB UD (Unreliable Datagram) transport is used 34 and so the interface MTU has is equal to the IB L2 MTU minus the 35 IPoIB encapsulation header (4 bytes). For example, in a typical IB 36 fabric with a 2K MTU, the IPoIB MTU will be 2048 - 4 = 2044 bytes. 37 38 In connected mode, the IB RC (Reliable Connected) transport is used. 39 Connected mode takes advantage of the connected nature of the IB 40 transport and allows an MTU up to the maximal IP packet size of 64K, 41 which reduces the number of IP packets needed for handling large UDP 42 datagrams, TCP segments, etc and increases the performance for large 43 messages. 44 45 In connected mode, the interface's UD QP is still used for multicast 46 and communication with peers that don't support connected mode. In 47 this case, RX emulation of ICMP PMTU packets is used to cause the 48 networking stack to use the smaller UD MTU for these neighbours. 49 50Stateless offloads 51 52 If the IB HW supports IPoIB stateless offloads, IPoIB advertises 53 TCP/IP checksum and/or Large Send (LSO) offloading capability to the 54 network stack. 55 56 Large Receive (LRO) offloading is also implemented and may be turned 57 on/off using ethtool calls. Currently LRO is supported only for 58 checksum offload capable devices. 59 60 Stateless offloads are supported only in datagram mode. 61 62Interrupt moderation 63 64 If the underlying IB device supports CQ event moderation, one can 65 use ethtool to set interrupt mitigation parameters and thus reduce 66 the overhead incurred by handling interrupts. The main code path of 67 IPoIB doesn't use events for TX completion signaling so only RX 68 moderation is supported. 69 70Debugging Information 71 72 By compiling the IPoIB driver with CONFIG_INFINIBAND_IPOIB_DEBUG set 73 to 'y', tracing messages are compiled into the driver. They are 74 turned on by setting the module parameters debug_level and 75 mcast_debug_level to 1. These parameters can be controlled at 76 runtime through files in /sys/module/ib_ipoib/. 77 78 CONFIG_INFINIBAND_IPOIB_DEBUG also enables files in the debugfs 79 virtual filesystem. By mounting this filesystem, for example with 80 81 mount -t debugfs none /sys/kernel/debug 82 83 it is possible to get statistics about multicast groups from the 84 files /sys/kernel/debug/ipoib/ib0_mcg and so on. 85 86 The performance impact of this option is negligible, so it 87 is safe to enable this option with debug_level set to 0 for normal 88 operation. 89 90 CONFIG_INFINIBAND_IPOIB_DEBUG_DATA enables even more debug output in 91 the data path when data_debug_level is set to 1. However, even with 92 the output disabled, enabling this configuration option will affect 93 performance, because it adds tests to the fast path. 94 95References 96 97 Transmission of IP over InfiniBand (IPoIB) (RFC 4391) 98 http://ietf.org/rfc/rfc4391.txt 99 IP over InfiniBand (IPoIB) Architecture (RFC 4392) 100 http://ietf.org/rfc/rfc4392.txt 101 IP over InfiniBand: Connected Mode (RFC 4755) 102 http://ietf.org/rfc/rfc4755.txt 103