@@ -24,6 +24,49 @@ Partitions and P_Keys
24
24
The P_Key for any interface is given by the "pkey" file, and the
25
25
main interface for a subinterface is in "parent."
26
26
27
+ Datagram vs Connected modes
28
+
29
+ The IPoIB driver supports two modes of operation: datagram and
30
+ connected. The mode is set and read through an interface's
31
+ /sys/class/net/<intf name>/mode file.
32
+
33
+ In datagram mode, the IB UD (Unreliable Datagram) transport is used
34
+ and so the interface MTU has is equal to the IB L2 MTU minus the
35
+ IPoIB encapsulation header (4 bytes). For example, in a typical IB
36
+ fabric with a 2K MTU, the IPoIB MTU will be 2048 - 4 = 2044 bytes.
37
+
38
+ In connected mode, the IB RC (Reliable Connected) transport is used.
39
+ Connected mode is to takes advantage of the connected nature of the
40
+ IB transport and allows an MTU up to the maximal IP packet size of
41
+ 64K, which reduces the number of IP packets needed for handling
42
+ large UDP datagrams, TCP segments, etc and increases the performance
43
+ for large messages.
44
+
45
+ In connected mode, the interface's UD QP is still used for multicast
46
+ and communication with peers that don't support connected mode. In
47
+ this case, RX emulation of ICMP PMTU packets is used to cause the
48
+ networking stack to use the smaller UD MTU for these neighbours.
49
+
50
+ Stateless offloads
51
+
52
+ If the IB HW supports IPoIB stateless offloads, IPoIB advertises
53
+ TCP/IP checksum and/or Large Send (LSO) offloading capability to the
54
+ network stack.
55
+
56
+ Large Receive (LRO) offloading is also implemented and may be turned
57
+ on/off using ethtool calls. Currently LRO is supported only for
58
+ checksum offload capable devices.
59
+
60
+ Stateless offloads are supported only in datagram mode.
61
+
62
+ Interrupt moderation
63
+
64
+ If the underlying IB device supports CQ event moderation, one can
65
+ use ethtool to set interrupt mitigation parameters and thus reduce
66
+ the overhead incurred by handling interrupts. The main code path of
67
+ IPoIB doesn't use events for TX completion signaling so only RX
68
+ moderation is supported.
69
+
27
70
Debugging Information
28
71
29
72
By compiling the IPoIB driver with CONFIG_INFINIBAND_IPOIB_DEBUG set
@@ -55,3 +98,5 @@ References
55
98
http://ietf.org/rfc/rfc4391.txt
56
99
IP over InfiniBand (IPoIB) Architecture (RFC 4392)
57
100
http://ietf.org/rfc/rfc4392.txt
101
+ IP over InfiniBand: Connected Mode (RFC 4755)
102
+ http://ietf.org/rfc/rfc4755.txt
0 commit comments