Why is someone who has been investigating security incidents for ten years so much better than someone who has only been doing it for a year?
That’s a simple question, and the simple answer is experience. As an analyst learns the fundamentals, develops a larger tool chest, and encounters more diverse scenarios they will naturally become better at their craft.
That’s straightforward, but consider these alternate scenarios. There are analysts who have been involved with security investigations for three years who are better than analysts who have been involved for ten years. Why is that? Furthermore, if there are two analysts with the same amount of experience, why would one analyst be better at investigating things than the other?
While we like to measure experience in units of time that is rarely an effective way to relate why an analyst is good at their job. Experience is related to expertise, but they don’t always directly correlate.
Today, I want to focus two elements particularly relevant to how expertise can be quantified between novice and expert analysts. These are rule-based reasoning and investigation heuristics.
I recently conducted a series of case studies where I brought in several security analysts of varying experience levels and asked them to describe a case they had worked. Through a technique known as the stimulated recall interview, I had them describe the process from beginning to end, focusing on why they took certain actions as the investigation progressed.
Once I collected a reasonable sample of these case studies, I reviewed each of them and performed a key phrase mapping exercise. I identified a list of categories based on a dual process theory model and mapped relevant statements made by the analyst to those categories. I was left with a distribution of how many responses existed in each category that I could divide based on various analyst demographics, like experience.
One category where there was a significant difference between the number of responses given be novice and expert analysts was rule-based reasoning. The expert analyst had nearly three times as many instances where rule-based reasoning was responsible for their actions.
Rule-based reasoning can be best thought of as an if-then-else statement. It’s a way that many believe humans store, retrieve, and manipulate knowledge, often leading to an action. Of course, as with several matters of the mind there are other theories too.
Regardless, it should come as no surprise that computers were designed to work using if-then-else statements, because computers are in some ways mankind’s attempt to recreate itself. It represents some of our most fundamental understanding of how we think and process information, and it can be demonstrated in all walks of life. Investigations are no different.
Consider the domain google.com. When you see that domain appear in an alert you immediately assume the alert is a false positive. This is because you’ve applied a rule like this:
- If: Domain belongs to a well-known public company
- Then: It’s probably not hosting malicious content
- Else: It might have been victim of a strategic web compromise
Now consider the domain jr2jk2ndiskd.oje2kje.ru. When you see this domain in an alert you immediately assume its evil. This could be the result of a rule like this:
- If: Domain appears to be mostly random alphanumeric characters
- Then: It might be generated by a domain name generation algorithm and/or owned by an attacker
- Else: It could be a coincidence, and should be documented in case I run across it again
These are simple rules that can be articulated easily. Of course, not all rules are that cut and dry.
Even if you don’t realize it, any time you review evidence in an investigation you’re evaluating a set of rules to make decisions. Some of these are very deliberate (reflective thinking) and some of them are very automatic (intuitive thinking). These two types of thinking and how they relate define dual process theory.
With that said, a rule-based system is a simplification of something that is insanely more complex. We aren’t just dealing with a linear approach to information processing, but more likely with the activation of millions of neurons in a semantic network or some other form of connectionist model. That goes well beyond the scope of this article and some levels of the current state of human understanding. Although a simplification, a rule-based system is a reasonable one for how humans might take inputs, compare them against existing knowledge (see: top-down processing), and produce outputs.
Given this perspective on rule-based reasoning, it should come as no surprise that expert analysts have a much larger library of rules than novice analysts. These rules can be gained through experience, but as I stated earlier, experience doesn’t correlate perfectly with expertise. Gaining expertise is more about optimizing the analyst’s ability to build mental rules than arbitrarily waiting for the passage of time.
Certainly experience provides more of an opportunity to learn things, but if we can identify those things then there is little reason they can’t be taught in a more direct manner. Practically, this means that it’s possible to accelerate the rate at which an analyst gains experience by subjecting them to an environment that is more suitable for the development of rules.
That’s one reason we get analysts with the same amount of experience but varying levels of expertise (ignoring natural disposition towards the work). One environment might support the development of rules better than another. Experience is accelerated in these environments.
A simple way to help analysts develop a bigger library of rules is to write them down. The infosec industry has done a poor job of this, as it’s not something you’ll find publicly available. Some organizations have invested in the creation of investigation playbooks, which are a step in the right direction.
To document investigation-focused mental rules, the same if-then-else framework discussed earlier can be applied. If it ain’t broke, don’t fix it. These are more appropriately called heuristics, which are rules used to make decisions, solve problems, or draw conclusions. Better said, heuristics are mental shortcuts to finding answers to questions.
A more formalized heuristic format looks like this:
Input: $evidence_type If: $evidence is/has/contains $observation Then: $conclusion Else: $conclusion
Each heuristic is given a name for quick reference. It also includes an input evidence type, because in general any investigative conclusion is drawn from some type of observation or analysis on evidence. In many cases, a heuristic could be relevant for input of multiple types of evidence, or may require multiple types.
From there, the if-then-else statement makes up the meat of the heuristic. Similar to normal if-then-else statements, these scenarios can be made infinitely more complex. Of course, the simpler they can be made the better. Humans are processing these, so they don’t have to be perfect or follow all the same guidelines as though we’d expect a computer to be able to interpret them. Here are a few examples.
Domain Fast Flux Heuristic
- If: Domain resolves to a large number of IP addresses with diverse registration ownership or geography in a very short period
- Then: It is likely that the domain is attacker owned and exhibiting fast-flux characteristics.
- Else: The domain could be owned by a hosting company.
- Else: The observation could be a coincidence.
File Type Mismatch Heuristic
- If: A file received in an e-mail is identified as a specific type based on its extension, but static analysis identifies a different file type.
- Then: It is probable that the file is malicious in nature.
- Else: The observation could be a coincidence.
Isolated POST Heuristic
Input: IP, URL
- If: An external IP sends an HTTP POST to one of your web servers, but doesn’t send any HTTP GET requests during the same period.
- Then: There is a possibility that the internal host has become infected with a web shell, and the communication represents malicious traffic.
- Else: This could be normal behavior for the system.
These heuristics all share the fact that they probably aren’t strong enough indicators on their own to warrant detection alerts; at least, not as scale grows beyond the small business. They do make useful investigation heuristics given the appropriate input in another investigation, whether alert-driven or human-driven (as in hunting).
This is a simplified example of a structured heuristic, but there is room to add a lot of interesting metadata to this format. For example, adding reference points to specific techniques used to retrieve evidence. Another example would be adding confidence ratings to the conclusions. This is a great place to make use of words of estimative probability so analysts can approach the heuristic with the appropriate weight and scrutiny.
Ultimately, the format doesn’t matter too much as long as this fits into the investigative workflow seamlessly. If you are embracing the investigation method, this should fit well with the question-hypothesis-answer format. These heuristics serve the role of helping develop questions and hypotheses to existing questions. They can also be used to drive initial observations when the investigation takes the form of hunting.
As a Teaching Tool
In an ideal world, the industry rallies around a format for investigation heuristics that can be explained in both a narrative and programmatic form, a standard is developed, and large common bodies of knowledge could exist that teach people how to investigate things.
In reality, the information security industry isn’t great at standards, so it’s probably a bit of a pipe dream; but it’s okay to have goals. In the interim, just maintaining a simple wiki with these types of investigation shortcuts can provide a tremendous benefit to analysts in your environment attempting to gain expertise. Even in environments where you might be a one-man-army network administrator and security analysts, having the reference available and reviewing it within the context of an active investigation is a helpful. It’s a worthwhile up front time investment.
They goal of this article isn’t to give you a format for creating and storing investigation heuristics. Instead, it’s to introduce rule-based reasoning and how the familiar construct of the if-then-else statement can be used to represent investigation shortcuts. It’s up to you to find the best way you can capture and represent this information for your own development, and the nurturing of analysts on your team.