What is MTU and How to use it to locate network issues

 

It is normal to see the TCP retransmission most of time which will cause lots of different network issue, packet drop etc. Today let’s have a short discussion about improper size of MTU caused packet drop during the transmission.

 

Definition

PDU Protocol Data Unit (describes a message at some protocol layer; sometimes used interchangeably and informally with packet, frame, datagram, segment, or message)

MTU Maximum Transmission unitThere is a limit on the size of the frame available for carrying the PDUs of higher-layer protocols in many link-layer networks such as Ethernet. For the Ethernet, usually limits the number of payload bytes to about 1500.

 

 

Datagram fragmentation – If IP has a datagram to send, and the datagram is larger than the link layer’s MTU, IP performs fragmentation, breaking the datagram up into smaller pieces (fragments), so that each fragment is smaller than the MTU.

When the two hosts are in the same subnet are communicating with each other, the MTU of the local link interconnecting them has direct effect on the size of the datagrams that are used during the conversation.

When the two hosts communicate across multiple networks, each link can have a different MTU. The minimum MTU (最大流算法) across the network path comprising all of links is called the path MTU. Path MTU may vary because the path might differently route each time, paths are often not symmetric.

IP Fragmentation – Link-layer framing normally imposes an upper limit on the maximum size of a frame that can be transmitted. To keep the IP datagram abstraction consistent and isolated from link layer details, IP employs fragmentation and reassembly. Datagram fragments can themselves be fragmented for IPv4.

Fragmentation only happens during the frame transmits from original host to the target host, the intermediate routes will compare their own forwarding interface MTU with datagram size and perform the fragmentation if the datagram is too large.

 

Wireshark is the tool we can use for traffic network capture and analysis

Before we go ahead to give an example, there are something we need to know, Wireshark captures packets before they are sent to the network adapter. That means, the packet length you noticed even though showing large number than NIC MTU, it will be fragmented at network adapter before sending out.

In order to reduce CPU load from the system, modern network adapters have offloading features which move some network processing load onto the network interface card. For example, the kernel can submit large (up to 64k) TCP segments to the NIC, which the NIC will then break down into MTU-sized segments. This particular feature is called TCP Segmentation Offload (TSO).

For Linux machine could use the following command to check TSO

 

For FreeBSD could use the following command to check TSO

 

Real-life Scenario:

Issue Description:
======================

Storage Admin implemented their file server, they could use their Windows2012 server to ping the file server.

When they use the Windows Explorer to map the SMB share, they are able to see the 4 directories.

Currently there is a SMB share in the File Servers that has 4 PaPdirctories:

  1. A
  2. B
  3. C
  4. D

They are only able to access the D directory, for the other 3 directories, the Windows Explorer will stop responding when they tried to access them.

The D directory only has 2 files, they tested by increasing the number of files. When they have more than 7 files in the directory, they start encountering the accessing issue.

The Symptoms is Windows explorer hanged not responding for about 1 min and finally showed the handle is invalid.

 

Packets captured at file server side:
======================

 

Issues analysis:
======================

From SMB protocol level, the client sent the find request to list files and folders within folder B, the file server responds with that list many more times repeatedly, however file server did not get the reply from the client side, so file server initiate 9 times TCP retransmission with file list over 30 seconds.

The client dropped the previous SMB session since it’s unresponsive, and restart a new session to send the previous dropped session file ID, the file server responded the previous session already been closed.

 

Let’s focus on RCA:
======================

That’s it, that’s a network issue, but how comes?

Here MTU comes, we find those packets failed to be sent their MTU is larger than 1500 and they set with not allow to be fragmented. That makes a logical sense those packets larger than 1500 were possibly dropped during its transmission. And Storage admin found their file server 10G interfaces were set with MTU 9000 that cause the issue there.

 

 

 

Let’s talk more:

As you can see the terrible numbers below, especially frame 13126, the IP packet size is 15484 bytes, even large than Jumbo frame without allowing to be fragmented, how is that possible receiver responded sender got that packets successfully, that’s the Wireshark do this trick, you still remember Wireshark capture packets before they move to the physical network card, so the NIC card does not get the chance to fragment them.

 

 

 

Another example can make us clear see, before sending out the large packet, it will be fragmented to the size fit with network adapter MTU.

 

 

Referenced Links and documents:

Edmonds–Karp algorithm 最大流算法:
https://en.wikipedia.org/wiki/Edmonds%E2%80%93Karp_algorithm
http://www-bcf.usc.edu/~dkempe/teaching/edmonds-karp.pdf

TSO:
https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt
https://access.redhat.com/sites/default/files/attachments/20150325_network_performance_tuning.pdf
Addison.Wesley.TCP.IP.Illustrated.Volume.1.2nd.Edition

Leave a Reply

Your email address will not be published. Required fields are marked *