MTU size issues, fragmentation, and jumbo frames
The maximum transmission unit (MTU) is the largest number of bytes an individual datagram can have without either being fragmented into smaller datagrams or being dropped along the path between its source and its destination.
For Ethernet frames—and many other types of packets—that number is 1500 bytes, and it generally meets the requirements of traffic that can cross the public internet intact.
So, if 2000-byte Ethernet packets arrive at a router, it will split their payloads in two and repackage them into two packets that are each smaller than 1500 bytes and so meet the MTU.
An alternative is that the router drops the packet but sends the source device an internet control-message protocol (ICMP) packet-too-big message. The intent is for the source device resend the payload in smaller packets, but it might not be configured to support this.
MTU size also comes in to play when, for a frame to get from its source to its destination, it may have to cross a network that use a different protocol than that used by the source and destination networks. For instance, a device on an Ethernet LAN might want to send a payload to a device on an Ethernet LAN in another city and have to cross an MPLS connection on the way.
In that case the size of the Ethernet frames must be taken into consideration. If encapsulation of Ethernet in MPLS pushes the size of the MPLS frame past the MTU of the MPLS edge switches, the switches will drop it.
MTU size
The size of an MTU is governed by the physical properties of the communications media. Historically, network media were slower and more prone to error, so MTU sizes were set to be relatively small. For most Ethernet networks this is 1500 bytes, and this size is used almost universally on access networks. Ethernet II networks have a standard frame size of 1518 bytes, which includes a 14-byte Ethernet II header and a four-byte frame-check sequence (FCS). Other communications media have different MTU sizes.
Encapsulation overhead
When one protocol’s packets or frames are encapsulated within another protocol, it increases the overall frame size. Encapsulation adds a protocol header, so any packets that are created at 1500 bytes and are then encapsulated will exceed MTU the network can handle. The number of bytes encapsulation adds varies by type of protocol:
- GRE (IP Protocol 47) (RFC 2784) adds 24 bytes (20 byte IPv4 header, 4 byte GRE header)
- 6in4 encapsulation (IP Protocol 41, RFC 4213) adds 20 bytes
- 4in6 encapsulation (e.g. DS-Lite RFC 6333) adds 40 bytes
- Any time you add another outer IPv4 header adds 20 bytes
- IPsec encryption performed by the DMVPN adds 73 bytes for ESP-AES-256 and ESP-SHA-HMAC overhead (overhead depends on transport or tunnel mode and the encryption/authentication algorithm and HMAC)
- MPLS adds 4 bytes for each label in the stack
- IEEE 802.1Q tag adds 4 bytes (Q-in-Q would add 8 bytes)
- VXLAN adds 50 bytes
- OTV adds 42 bytes
- LISP adds 36 bytes for IPv4 and 56 bytes for IPv6 encapsulation
- NVGRE adds 42 bytes
- STT adds 54 bytes
There are many other situations where protocol encapsulation occurs, so you must be aware when this happens and take steps to accommodate it. A packet may originate as a standard IPv4 packet with a designated MTU of 1500 bytes, but depending on its destination it may pass through encapsulation that pushes its size over the MTU.
Path MTU Discovery (PMTUD)
Routers can fragment packets to cut them down to fit smaller MTUs, but this is not optimal. A packet incoming to a network device may be smaller than the MTU, but if it gets encapsulated by the device and the new total packet size exceeds the MTU of the outgoing interface, the device may fragment the packet into two smaller packets before forwarding the data.
For example, an IPv4 router will fragment and forward packets that exceed the MTU, but also send back an ICMP message-too-big error message to tell the source device that it should use a smaller MTU. On the other hand, IPv6 routers do not fragment oversized packets on behalf of the source; they just drop them and send back an ICMPv6 packet-too-big error message.
The main problem with MTU size being reduced across the network is that some applications may not work well in this environment.
To complicate matters, some routers ignore packet-too-big messages and keep sending packets that exceed the MTU. They are not following a standardized technique called path MTU discovery that can avoid fragmentation across a network.
Some nodes that send 1500-byte packets into the DMVPN and subsequently receive an ICMPv4 packet-too-big message from the router may choose to ignore this. These nodes are not performing Path MTU Discovery (PMTUD) as prescribed by IETF RFC 1191 or RFC 1981, and are therefore relying on the IPv4 routers to perform this fragmentation on behalf of the source host. RFC 2923 also covers the topic of “TCP Problems with Path MTU Discovery.” If the application cannot function properly in this environment, there could be end-user impacts. Also, if there is a firewall in the middle of the communication path somewhere that is blocking the ICMP error messages, then that would definitely prevent PMTUD from operating properly.
One method to test and detect a reduced MTU size is to use a ping with a large packet size. Here are some examples of how to do this.
C:UsersScottHogg> ping -l 1500 192.168.10.1
On a Windows host you can also set the Do Not Fragment (DF) bit to 1 with the -f ping parameter.
C:UsersScottHogg> ping 192.168.10.1 -l 1500 –f
On Linux the command would be:
RedHat# ping -s 1500 -M do 192.168.10.1
On a Cisco IOS device the command would be:
Router1# ping 192.168.10.1 size 1500 df-bit
On a Cisco NX-OS device the command would be:
Switch7K# ping 192.168.10.1 packet-size 9216 c 10
On a Cisco IOS XR device the command would be:
RP/0/RP0/CPU0:Router1#ping 192.168.10.1 size 1500 donnotfrag
On a JUNOS device the command would look like:
root@J4350-1# run ping 192.168.10.1 size 1500 do-not-fragment rapid
Fragmentation
IPv4 routers fragment on behalf of the source node that is sending an oversized packet. Routers can fragment IPv4 packets unless the Do-Not-Fragment (DF) bit is set to 1 in the IPv4 header. If the DF bit is set to 0 (the default), the router splits a packet that is too large to fit into the outgoing interface and sends two packets toward the destination. When the destination receives the two fragments, the destination’s protocol stack must reassemble the fragments before processing the protocol data unit (PDU). But there’s a danger when an application sends its packets with DF set to 1, does not pay attention to the ICMP “packet too big” messages, and does not perform PMTUD.
All IPv6 networks must support an MTU size of 1,280 bytes or greater (RFC 2460). This is because IPv6 routers do not fragment IPv6 packets on behalf of the source. IPv6 routers drop the packet and send back an ICMPv6 Type 4 packet (size exceeded) to the source indicating the proper MTU size. It then falls on the shoulders of the source to perform the fragmentation itself and cache the new reduced MTU size for that destination so future packets use the correct MTU size.
When routers perform fragmentation on behalf of the source, that adds CPU processing overhead on the router. If IPsec is being used, then the routers on both ends of the tunnel will need to handle the fragmentation and reassembly of the packets. If the routers are performing fragmentation on behalf of the source node, it may be desirable to have the fragmentation performed prior to encryption, so the destination tunnel router doesn’t have to reassemble the fragments and then perform the decryption.
The following two Cisco IOS global configuration commands can control this behavior.
Router(config-if)# crypto ipsec fragmentation before-encryption
Router(config-if)# crypto ipsec fragmentation after-encryption
There is a good document from Cisco on the 7600 switches and how to resolve these issues, entitled “Configuring IPSec VPN Fragmentation and MTU”.
MTU and MSS
Another method to handle the increase in MTU size due to encapsulation and the resulting fragmentation is to utilize the TCP Maximum Segment Size (MSS) parameter. The MSS is the largest number of bytes of payload that can be sent in a single TCP packet. In other words, the MSS is the largest amount of TCP data (in bytes) that can be transported over a computer network. This is negotiated during the TCP 3-way handshake in the SYN packet. The MSS is defined in RFC 879 for IPv4 and in RFC 2460 for IPv6. The MSS does not include the TCP header (20 bytes) or the IPv4 header (20 bytes; IPv6 header is 40 bytes).
When IPsec is being used, it is customary to set the MTU size on the tunnel interfaces to 1,400 bytes and to set the TCP-MSS-adjust to 1,360 bytes. This can be configured in a Cisco IOS device using these commands.
Router(config)# interface tunnel 4
Router(config-if)# ip tcp adjust-mss 1360
Router(config-if)# ip mtu 1400
For IPv6-enabled interfaces we can use the same type of functions, but the IPv6 header is 40 bytes instead of IPv4’s ~20-byte header. We must also consider the 20-byte TCP header, which is the same size for IPv4 and IPv6.
Router(config)# interface tunnel 6
Router(config-if)# ipv6 tcp adjust-mss 1340
Router(config-if)# ipv6 mtu 1400
This MSS option does not work for UDP applications: UDP is a connectionless protocol, so there’s no way to negotiate this during the handshake. For UDP applications that do not perform PMTUD and set the DF bit to 1, one option may be to configure a policy that sets the DF bit back to zero.
For more on this topic, read “Resolve IP Fragmentation, MTU, MSS, and PMTUD Issues with GRE and IPSEC” from Cisco.
Compensate by increasing the MTU size
As we’ve seen, the primary issue with MTU size arises when encapsulation takes place while the links between sites only support a 1,500-byte MTU. This is frequently the case for links between enterprise routers and the upstream ISP routers, or between CE routers and PE routers.
It would be highly desirable to be able to increase the MTU size over the WAN. If the MTU size could be increased throughout the path across the WAN, then the added encapsulation overhead could be compensated for by the WAN interface of the routers. This would eliminate the need to reduce the MTU size on the tunnel interfaces, adjust MSS, and alleviate the routers from performing any fragmentation. That’s where jumbo frames come in
Jumbo frames
Jumbo frames are network-layer PDUs that have a size much larger than the typical 1,500 byte Ethernet MTU. In some situations, jumbo frames can be used to allow for much larger frame sizes if the networking hardware is capable of this configuration. Most modern routers and switches, as well as most datacenter networking hardware, can support jumbo frames.
Larger frames can also boost speed. With larger frame sizes — and thus larger payload sizes — you can have less protocol overhead and are able to achieve higher protocol efficiency. In other words, your “goodput” improves with larger frame sizes. You can also reduce network bandwidth and CPU cycles on network hardware.
To configure the jumbo frame MTU size on a Cisco IOS device, just enter the MTU command on the interface configuration like this:
Router(config)# interface GigabitEthernet 4/1
Router(config-if)# mtu 9216
The show interface command will verify the interface’s new MTU size.
For other manufacturers’ equipment, you just have to look for a configuration command within the physical or virtual interface that allows you to set the MTU size greater than 1,500 bytes.
The key concept to keep in mind is that all the network devices along the communication path must support jumbo frames. Jumbo frames need to be configured to work on the ingress and egress interface of each device along the end-to-end transmission path. Furthermore, all devices in the topology must also agree on the maximum jumbo frame size. If there are devices along the transmission path that have varying frame sizes, then you can end up with fragmentation problems. Also, if a device along the path does not support jumbo frames and it receives one, it will drop it.
Jumbograms
Jumbo frames should not be confused with jumbograms. When discussing communications protocols, frames are the PDU used at Layer 2 (the data link layer) of the OSI model, packets are the PDU used at Layer 3 (the network layer). A jumbogram is a larger Layer 3 packet that exceeds the link MTU size. IPv4 is capable of generating payloads up to 65,535 bytes, while IPv6 is capable of a 32-bit “Jumbo Payload Length” size within a hop-by-hop option header. Therefore, IPv6 could support a ridiculous 4.2GB payload. Clearly, that packet could not be transported on any type of common networking interface — just imagine the repercussions of a retransmission.
Jumbo frame support
Most network devices support a jumbo frame size of 9,216 bytes. This isn’t standardized like Ethernet’s 1,500 byte MTU, though, so you want to check with your particular manufacturer on the largest frame size their devices support and how to configure the changes. Even within a single manufacturer’s line of network products, the MTU capabilities may vary greatly, so it is important to do a thorough investigation of all your devices in the communication paths and validate their settings. For instance, some Intel Gigabit adapters support jumbo frames but many do not.
Recommendations
Problems with MTU size reduction due to tunnels, IPsec encryption, and overlay protocols can degrade network performance. If you are using encapsulation technologies, then you should consider increasing the MTU size, particularly in the core of the network or WAN to avoid fragmentation and PMTUD issues. Ask your service provider if they support larger frame sizes within their network and on the link between their PE and your CE router.
Learning about the benefits of jumbo frames may be beneficial to your network’s performance. However, it is important to explore if and how your network devices support jumbo frames before you turn this feature on. Some of the biggest gains of using jumbo frames can be realized within and between data centers. But you should be cognizant of the fragmentation that may occur if those large frames try to cross a link that has a smaller MTU size.
Copyright © 2021 IDG Communications, Inc.