Analyzing Large Capture Files 4: Whittling with Filters

Whittling is a lost art, but it’s a beautiful process. A craftsman chooses a lifeless piece of scrap wood and slowly carves slivers off of it until it takes an impressive form. It might wind up as a toy for a child or a game call for a hunting trip. In either case, the transformation is quite impressive. I think about whittling often when I need to use a lot of filters to find the data I want in a packet capture. Yes, I know that’s a weird transition, but it’s true. While not quite as slow and painstaking a whittling, the process of slowly peeling back packets is also reductive. By using PCAP analysis tool filtering capabilities you can slowly tune out the things you don’t care about until you’re left with the important stuff, ultimately transforming the PCAP.

In part four of this series, I’ll describe some different packet analysis tool filtering capabilities, some of the filters I use when whittling down PCAPs, and some tricks for applying them effectively. This isn’t meant to be a complete guide on filtering, but if you’re looking for something like that then be sure to check out my Practical Packet Analysis book or online course where I have an entire section dedicated to filtering.

You can find the first three parts of this series here:

Filtering Techniques

There are several mechanisms available for filtering packet capture files down to something meaningful, including those that are built for that task and other tools that can be adapted for it. What you use will depend on the tools you have available and level of granularity you need.

Berkeley Packet Filters (BPFs)

The most widely used and universally available standard for filtering packets is the BPF syntax. Interpreted by nearly every major packet capture and analysis tool (including tcpdump, Wireshark, and tshark), BPFs take a simple form that relies on keywords and values to build filtering expressions based on common layer 2 and 3 attributes of communication. While BPF syntax excels at simple filtering on lower layers, it lacks the ability to filter layer 7 protocol field data as easily. BPFs are also fast, so you won’t have to wait long to apply a filter and produce output.

Figure 1: Reducing a PCAP with BPFs in tcpdump

More on BPFs:

Wireshark Display Filters

The Wireshark tool suite relies on blocks of code called dissectors to interpret packets and break protocols down into individual fields. The detailed interpretation of the protocols means that each field is also available for filtering, which provides a great deal of flexibility. Wireshark display filters use a hierarchical structure (protocol.field.subfield) to allow for deep introspection using simple keywords. This provides flexibility beyond BPFs, particularly if you need to filter on layer 7 protocol fields. If you’re working with a large capture file it might not be feasible to load it all into Wireshark to apply a display filter, but fortunately, you can also apply display filters with tshark. It uses the same set of dissectors as Wireshark.

Figure 2: Reducing a PCAP with display filters in tshark

More on Wireshark Display Filters:

NGrep

If you’re familiar with unix command line tools then you’re certainly familiar with the power and flexibility of grep for performing regular expression-based searches. Network grep (Ngrep) uses a similar approach as grep, but adds the flexibility to parse and read network data. This allows you to apply regular expressions to packets along with BPFs. The beauty of this approach beyond grep is that it allows you to write the matching packets to a PCAP, rather than just doing simple text matching. This is ideal if you want to reduce a PCAP but still open it in a packet analysis tool like Wireshark.

Figure 3: Reducing a PCAP with NGrep

More on NGrep:

Command Line Filtering Tools

Most PCAP whittling occurs on the command line. This is because command line tools are often a bit more flexible, and it just isn’t typically feasible to load really large PCAPs into graphical tools without exhausting the available memory on a system. While working on the command line, you have access to all the other great command line tools that you might also rely on for parsing logs and other evidence sources. Keep in mind that to answer most network related questions you don’t need the entire packet. You just need values from a few fields, and those are often simple text strings you can manipulate like any other text string. So, you may start by using tshark or tcpdump to produce a text output of PCAPs, and then pipe that data to a traditional text analysis tool.

A few of the more common CLI tools used to filter and interact with text output include:

  • grep: Allows searching data with regular expressions
  • awk: A pattern scanning and processing language used to filter text
  • cut: Slices out parts of lines
  • head/tail: Outputs the beginning or end of a file or output
  • sort: Arranges data in ascending or descending order
  • uniq: Finds or omits repeated lines or values

 

Figure 4: Reducing a PCAP with tshark + grep + cut

More on CLI Filtering Tools:

A Strategy for Whittling PCAPS

Now that you know what tools are available, you should begin to think about the mechanics and process of whittling your PCAP down. For me, that process generally looks like this:

  1. Use summary statistics to get a lay of the land regarding the PCAP
  2. Apply a filter to remove something I don’t need. This is usually based on:
    1. Hosts
    2. Ports
    3. Protocols
    4. Protocol Features/Values
  3. Repeat

I may go through this process several times before the PCAP gets down to a manageable size for me to start seeking out answers to specific questions. Let’s look at an example.

The file lotsopackets.pcap is really large, but I need to pick my way through it to answer questions about the malicious activity.

swamp:~ sanders$ tshark -nnr lotsopackets.pcap | wc -l
  14204191

I’m piping the tshark output to WC, which will count the number of lines (packets in this case) that are output. In a real scenario I’d likely pipe that output to another file, but I’m using WC here to show you the reduction that is taking place. It tells me that there are 14204191 packets in this capture.

I’ll start assessing the PCAP by using Wireshark/tshark’s protocol hierarchy feature to determine the protocols in use (not shown). Really quickly I see that there is a lot of extraneous data and I’m only really concerned about HTTP and DNS traffic. That’s my first filter.

swamp:~ sanders$ tshark -nnr lotsopackets.pcap -Y ‘http || dns’ | wc -l
  58123

Next, I look at a list of conversations or endpoints (not shown) to determine what hosts may be involved. I quickly pick out the heavy hitters and do some research to figure out which ones are legitimate. Those can be excluded too.

swamp:~ sanders$ tshark -nnr lotsopackets.pcap -Y ‘(http || dns) && !(ip.addr == 12.0.0.0/8) && !(ip.addr == 4.0.0.0/8)’ | wc -l
  8221

Now I start to examine the content of the HTTP data and figure out that I’m mostly concerned with just the GETs and POST request methods, an attribute of that protocol. I can filter based on just those request methods instead of getting all the other HTTP traffic.

swamp:~ sanders$ tshark -nnr lotsopackets.pcap -Y ‘(http.request.method == GET || http.request.method == POST || dns) && !(ip.addr == 12.0.0.0/8) && !(ip.addr == 4.0.0.0/8)’ | wc -l
  212

Finally, I’ve decided I’m particularly interested in a specific domain that contains the three characters “bnc”. I’ll use grep to filter the packets based on that string.

swamp:~ sanders$ tshark -nnr lotsopackets.pcap -Y ‘(http.request.method == GET || http.request.method == POST || dns) && !(ip.addr == 12.0.0.0/8) && !(ip.addr == 4.0.0.0/8)’ | grep bnc | wc -l
  95

The transformation is dramatic. We’ve reduced the PCAP down to a manageable size where you can start to answer specific questions about the events that have transpired.

This was a simplified version of a process that can take many shapes and forms. We only went through a few steps, but at times I’ve gone through dozens of steps to get the PCAP where I need it to be. This will result is some pretty frightening filter strings, particularly as you begin excluding/including large lists of IPs or protocol features. There are a few tips that will help keep things reasonable and make you more efficient:

  • When possible, use filter files. That separates your filter string from your command line invocation and you can leverage a text editor like Sublime or Atom to edit them a bit more cleanly.
  • Always uses parenthesis and quotes where possible to separate logical segments of your filters. These are optional in some cases, but I get into the habit of using them even when they aren’t required because keeps things organized and they’re already there should my filters expand to the point where they’re needed.
  • Create checkpoints as you work. Filtering an input file and create a separate output file. Then, apply new filters to that output file instead of the original source file. These little snapshots provide a baseline for you to go back to should your analysis findings dictate you need to broaden your filters.
  • Save frequently used filter strings. I do this for common protocols or noisy hosts on the network that I need to filter out often.
  • Remember that piping to command line tools might limit you. For example, when I used grep in the example I’ve gone from the analysis of PCAP to the strict analysis of text displayed in the bash terminal. If you still need access to the source packets and not just the text output you’ll need to use a different tool, like ngrep, to perform your filtering on the actual packets and not just their text representation.

Conclusion

In this article, I described how to use filters to whittle down packet captures to a manageable size. While filtering is the most common technique for reducing PCAP size, most analysts never grasp the array of tools that can be used to achieve efficient filtering or a full process and method of approaching the task. My hope is that exposing you to these tools and processes will help bring some structure to how you approach and reduce PCAPs with filters.

If you like this article, you’ll really like my online packet analysis course. It’s packed with over 40 hours of training. You’ll learn how to decipher common protocols at the packet level, normal/abnormal stimulus and response, and more techniques for investigating anomalies. You’ll do this while going through hands-on exercises using Wireshark and command-line based packet analysis tools. You can learn more about the class here.

 

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.