Analyzing Large Capture Files Part 2 – Protocol Hierarchy

In addition to the packet colorization technique, the first article in this series discussed the importance of question-driven analysis. Those who can ask the right question are most likely to arrive at the correct answer the fastest. That brings to light the issue of how you ask the right question in the first place. One technique is to assess your surroundings via exploration in such a way as to advise the next logical question. I usually refer to this technique as assessing the “lay of the land”. In part two of this series, I’ll discuss how I use protocol hierarchies to get the lay of the land and create valuable questions.

Protocol Hierarchy

When presented with an array of packets you’re going to look for details that help you quickly assess their function. No details are more fruitful than the protocol encapsulated within these packets. If you observe HTTP packets, you know that you’ve encountered an exchange of data between an HTTP client and server. If you run into DNS traffic, you know that a host is attempting to resolve a piece of information from another piece of information, most likely as a precursor to direct communication. Knowing the protocols in use helps you determine the goal of the communication and lets you frame the questions used to further interpret those packets.

Most tools make it easy to identify the protocols used in a capture file, but visually interpreting this information becomes unwieldy with large captures, so summarization is needed. My favorite method for understanding what protocols may be present in a capture is to generate a protocol hierarchy chart.

You can generate a protocol hierarchy chart in Wireshark by selecting the Protocol Hierarchy option from the Statistics drop-down menu.

Figure 1: Protocol Hierarchy in Wireshark

This chart provides a list of protocols observed in the capture. It’s called a hierarchy because the data is arranged based on layers of communication since most packets contain several encapsulated protocols. That’s why an HTTP packet is listed beneath a TCP packet, both are listed beneath IP packets, and so on.

Wireshark provides the flexibility to right click on anything in this chart and use a context menu to directly filter or colorize packets. I use this to quickly pivot to areas of interest and answer questions I’ve formulated based on the protocols listed. You can also use a deductive strategy by filtering out protocols that are of no interest to you directly from the protocol hierarchy screen. Once you have filtered things that aren’t interesting you can save the results to a separate capture file for additional analysis.

You can generate a similar chart on the command line by using the appropriate Tshark command: 

tshark -r packets.pcap -qz io,phs

Figure 2: Protocol Hierarchy in TShark

The beauty of the protocol hierarchy is that it can help you build a snapshot of roles and functions. This is particularly powerful when your capture is focused on a single friendly host. For example, take a look at Figure 3.

Figure 3: Two Protocol Hierarchies

The top hierarchy is SMTP and IMAP, indicating the sending or receiving of mail. The bottom is mostly HTTP and DNS data, which tells me this is probably some type of web browsing. It’s the presence of these protocols along that will help formulate more interesting questions.

For example, if I see HTTP data I’ll ask questions like:

  • Is this browsing from a user or an application?
  • What led the user to the sites visited?
  • Were any suspicious file types or names downloaded?
  • Do any of the domains look suspicious?
  • Was any data uploaded?

 

Answering each of these questions involves using different techniques and pivoting away from the protocol hierarchy.

This screen remains useful even in very large capture files because a large number of packets does not inherently mean a great amount of diversity in the protocols used. A ten second sample from a group of hosts will often have the same distribution of protocols as a ten hour sample. While other analysis techniques might not scale well, the protocol hierarchy chart does (although it will take longer to generate as your PCAP grows larger).

 

When is protocol hierarchy useful?

This is one of the first techniques I use any time I approach a capture without a very specific question already in mind. It often happens in security analysis when I’m investigating a single host and I have a suspicion that it might be compromised. I’ll take a communication sample over a span of several minutes and use the protocol hierarchy to help form more specific questions whose answers I can pursue.

I also use this technique when I’m handed PCAPs from other people. I typically won’t look at PCAPs without someone else relaying their specific question, but even so, I use protocol hierarchy to figure out the lay of the land to help refine that question and ask others.

 

What are interesting things I should look for in a protocol hierarchy?

The protocol hierarchy is rarely ever a terminal destination for me. It rarely answers questions but is instrumental in helping define them so that I can use other techniques to find answers. The things that are most likely to generate those questions are the presence of specific protocols that lend themselves to…

…specific protocols that lend themselves to follow on questions. The nature of certain protocols inherently evokes specific questions. If there is SMTP, what mail was sent? If there is DNS, what was resolved? If there is ICMP, what type and code were triggered? If you know the function of a protocol you should be able to follow the path that knowledge makes present.

…unexpected protocols. If you understand the role of a device then you should have a sense of what protocols are required to serve that purpose. When you see an unexpected protocol then you should ask why it’s present. Why is a domain controlling browsing via HTTP? Why is there ICMP generated from a user workstation?

…unexpected ratios of data for individual protocols. It’s often not the presence of a protocol that draws attention, it’s the amount of it. For example, an examination of browsing activity should yield some DNS traffic, but not a significant amount compared to the HTTP traffic present. If the expected ratios are reversed that is worth exploring. You can often focus on large amounts of less voluminous protocols like DHCP, ICMP, DNS, and so forth. These will generally be protocols that are responsible for transmitting very small amounts of information or control commands, rather than significant amounts of data.

Can I export the protocol hierarchy?

The protocol hierarchy chart is well designed for visual analysis, but if you ever have a desire to perform any more advanced analysis or transformation of that data then you’ll be left wanting if you’re relegated only to Wireshark or Tshark. As many a security analyst has found, there isn’t much a CSV can’t help overcome. From the protocol hierarchy in Wireshark click the Copy button and choose the as CSV option. You can paste the contents of your clipboard into a separate file and analyze it in Excel, with a custom script, or using any tool that will read CSV formatted data. This isn’t something I’ve had too much occasion or need to do in my work.

 

Conclusion

The ideal scenario provides data that you’re pursuing with a specific question in mind. Of course, investigations are rarely ideal. The second article in this series about analyzing large capture files focuses on getting the lay of the land to help you ask better questions. Generating a protocol hierarchy provides useful information about the contents of a packet capture based on the protocols in use. If you have adequate protocol knowledge this should naturally lead to questions that help you zero in on the answer you seek, or at a minimum, filter out things you don’t need.

If you like this article, you’ll really like my online packet analysis course. It’s packed with over 40 hours of training. You’ll learn how to decipher common protocols at the packet level, normal/abnormal stimulus and response, and more techniques for investigating anomalies. You’ll do this while going through hands-on exercises using Wireshark and command-line based packet analysis tools. You can learn more about the class here.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.