The blaring ring of the phone startled me. It was only 7:47 a.m.—really too early to start hearing complaints about network performance. Most users wouldn't even be in and accessing the network for at least another 13 minutes. As the network administrator, I counted on these moments—my quiet time—to prepare myself for the day ahead. <ring...> This was my time to sit in my office, drink my coffee and begin downloading e-mails sent overnight from our international offices. I shouldn't be hearing complaints from the users already! Oh, it was going to be a rough day. <ring...>
With a feeling of dread, I reached over and hit the speaker button. "Hello. This is Laura." After a short pause, my skin began to crawl as I heard Fred loud and clear—his voice shattering any peace and happiness that the day had promised. "Laura! Fred here."
"Fred? Fred who?" I always asked him this—just to irk him. He was the only "Fred" who called my office at the drop of a packet. What was it going to be this time? His system not running games as quickly as he wished? He couldn't figure out how to deal a new hand in solitaire? Bejeweled not giving him the credit he deserved? Gummi bears in the keyboard...again?
"Fred Erskine! I thought you'd recognize my voice by now!" <slight snicker at his pathetic attempt at a joke or a jab>
"Yes, Fred. What can I do for you?" Uh oh...here it comes. I quickly hit the mute button lest Fred hear my typical first response to his moronic plight.
"I just wanted to let you know that the network seems to be running... well...er...really well today. Thanks." <click>
What? What? Fred is calling to say the network is running well? What's up? In my 10 years of running this network and fielding Fred's complaints on a daily basis, I'd never heard him say this. What kind of cruel, sick joke is this?
<ring...><ring...> I opened my eyes...my alarm screaming at me to get up, put myself together and get into the office by 7:45—before the users arrive. Oh, just a dream. I knew it was too good to be true.
Users never call to say the network is running fine. Sigh. I crawl out of bed.
What are the most common reasons network performance lags behind expectations? In my almost 20 years of analyzing network traffic, these issues come to mind first:
- High latency, client issues
- Server issues or link issues
- Packet loss
- TCP window congestion
- Low throughput
- Dependency problems
- Application faults
At BrainShare 2007, two of the Bring Your Own Laptop (BYOL) sessions focused on performance issues. In this article we'll examine high latency, packet loss and TCP window congestion labs to demonstrate the process of locating the cause of poor performance.
Note: These trace files can be found on Laura's Lab Kit which is available online. If you are just interested in grabbing these traces, download nc05traces.zip from the Lab Kit.
> High Wire Latency
The trace file in Figure 1 depicts a common situation: sitting in a hotel trying to get decent Internet access. I just wanted to get to our home page, packet-level.com. The clock ticks. Figure 1 shows the beginning of this trace file. Use the recommended settings to view the trace files:
- Colorization on
- Time Display Format > Seconds Since Previous Packet
The first packet in this trace is my system looking for a Windows update. No response is received, but that isn't slowing down my system. When I launched my browser, it immediately attempts to connect to our home page.
The second packet is my DNS query for packet-level.com, and the wait begins. Approximately one second later my system generates another DNS query to the same DNS server. One second is a long time to wait for a DNS reply.
A DNS response is seen approximately 23 milliseconds after my second request. This triggers my lightening-fast system to send the TCP handshake packet out immediately (just over 4 milliseconds after the DNS response).
Try this: Can you locate any other DNS queries in this trace file and verify that the DNS server response is slow to that query as well? Phew! Maybe the DNS server is just slow and the rest of the browsing session will go well.
The server's SYN ACK arrives over one half of a second later. Ouch! This is terrible response time. Now consider that the TCP handshake process does not require any application-level processing. If the performance is shoddy at this point, then I'd look at a wire latency issue.
Packets 9 and 10 are interesting. Packet 9 is the DNS server's second response to the duplicate DNS query. My system responds with an ICMP Type 3/Code 3 – Destination Unreachable/Port Unreachable response. In other words, it's plugging its ears and singing, "La la la la, I can't hear you!" My system's moved on; once it received the DNS response, it shut down the port it used for the DNS queries.
Packets 8, 11 and 12 are duplicates. Note that all three of the packets use the same TCP sequence number in a packet that contains data. This is the definitive sign of TCP retransmissions.
Do we have packet loss somewhere along the path as well? Or is the latency just so high that my system makes duplicate requests because the responses are too slow?
The response ACK (packet 13) arrives 9.842289 seconds after the original request as shown in Figure 2. (Right click on packet 8 and Set Time Reference. Don't forget to toggle this off and reset your Time Display Format to Seconds Since Previous Packet after you have measured the time from packet 8 to packet 13.)
Now let's see how many retransmissions are in this trace file. We can do this several ways:
- Sort the Info column to group the "[TCP Retransmission]" notes
- Select Analyze > Expert Info
- Select Analyze > Expert Info Composite > Notes
- Apply a display filter for tcp.analysis.retransmission
Hint: Toggle off the Time Reference before applying a filter. Wireshark might keep your Time Reference packet in the display even though it does not match your filter value.
I chose to apply the filter and the status line indicates that there are 32 packets that matched my filter. Pathetic!
Clear out your filter and consider applying a filter for all packets that contain the SYN bit set to 1 (tcp.flags.syn == 1). This filter displays all SYN and SYN ACK packets and lets us see if the latency times on all the handshake processes are slow. You may need to reset your Time Display Format to Seconds Since Previous Packet again.
Figure 3 indicates that we consistently have serious delays during the handshake processes:
- 1032 > http .503667 seconds response time
- 1033 > http .822640 seconds response time
- 1034 > http .352788 seconds response time
Because the server does not need to do application processing to establish these connections, we can assume that wire latency is an issue.
In addition, scroll through the trace file with this filter applied; notice the client making connections to another server. The latency time when connecting to the other server is high as well. This might indicate the latency issue is local.
> Packet Loss and TCP Window
This HTTP file download process is experiencing two problems: packet loss and TCP window congestion. The packet loss is easy to spot if you scroll through the trace with colorization turned on. (see figure 4.)
Not only do we have packet loss, but we also appear to have a latency issue. The client sends 40 duplicate ACKs before receiving the retransmission. The server retransmits the packet when it receives two duplicate ACKs. This client had over 400 milliseconds to send duplicate ACKs before receiving the retransmission (packet 215).
Note: Packet loss with a high number of duplicate ACKs indicates high latency in addition to the packet loss. The receiver is able to get numerous duplicate ACKs onto the cabling system before the retransmission is received.
Certainly packet loss affects performance. Although our server continues sending data to the client, the client cannot process the requested file until all the data segments have been received. In addition, this client sent 40 extra packets (duplicate ACKs) to get a single segment of data.
Now select Analyze > Expert Info Composite > Notes and sort the Summary column in descending alphabetical order ("Z" on top). Expand the Zero Window line as shown in Figure 5.
Now this is looking really ugly. We can see seven "Zero window" events in the trace file beginning with packet 364. This indicates that a system states that it has no TCP receive buffer space available during a file transfer. In effect, the system is saying, "Shut up! I'm not listening."
Let's look at this point in the trace file. The client, 10.0.52.164, has requested a file from the server, 18.104.22.168, as seen in packet 4. We have already seen packet loss several times in this trace file.
As shown in Figure 6, packet 363 is tagged as "TCP Window Full." This packet comes from the server to the client. Wireshark tracks the advertised window size (defined in the TCP header Window field) and notes that this packet will overload the client's available buffer space. In packet 364 the client advertises a full buffer (TCP ZeroWindow event].
This situation triggers the server to begin the TCP Keep-Alive process. Focus on the time column (and have it set to show you Seconds Since Previous Packet). You can see how the server backs off and becomes more patient with each TCP Keep-Alive packet.
As you can see in this trace file, more than 30 seconds transpire before the client advertises a Window Update (packet 377). The server cannot send more data packets until the client's window size value increases.
What causes this full window condition? Most likely, the application (browser) is not pulling the data out of the TCP receive buffer in a timely manner; however, other applications may be causing this problem because multiple connections share a common TCP receive buffer. In addition, processor-intensive applications may be affecting the browser's ability process received data.
In this case, we know we have latency and packet loss issues. These problems may be out of our control if they are injected on a portion of the network path that we do not control. The client issue is something we can fix. Now we would examine the client system to see what other applications are running.
Note: In this case, our client was playing a video during the download process. The video playback was significantly dragging down system performance. To the user, however, the network just appears to be slow.
> Troubleshooting Blind
How can you identify the cause of poor network performance if you don't look at the communications? Take the guesswork out of troubleshooting by listening in on the traffic. The packets may not be able to tell you why the problem is occurring, but they will be able to tell you where the problem is occurring.
At BrainShare, Laura Chappell announced Wireshark University, an educational organization focused on training network analysts to troubleshoot and secure networks faster and more accurately using Wireshark (formerly Ethereal). Learn more network troubleshooting skills in the WSU03: Troubleshooting Network Performance course, available in self-paced and instructor-led format. Visit wiresharkU.com for course outlines, recommended prerequisites, self-paced-course ordering and instructor-led-course schedules.