As a lot of folks who know me are aware, one of the areas of security that I spend the majority of my time researching is the analytic process and how the human component of an investigation works. I’ve written and spoken on this topic quite a bit, and I’ve dedicated myself to it enough that I’ve actually elected to go back to school to work in a second masters degree focused on cognitive psychology. My hope is that I can learn more about cognitive functions of the brain and psychological research so that I can work towards taking a lot of the tacit knowledge that is security investigation (NSM, IR, Malware RE, etc), and turning it into codified information that can help shape how we as an industry look at the analysis of security compromises. This article (and hopefully many more to come) is related to that study.
————- Post Starts Here ————-
I’ve never been a fan of declaring concepts, theories, or ideas to be “dead”. We all know how that went when Gartner declared IDS to be dead several years ago. Last I checked, intrusion detection is still widely used and relatively successful at catching intruders when done right. Even more, the numbers don’t lie as Cisco bought Sourcefire, makers of the world’s most popular IDS technology Snort, for 2.7 BILLION dollars last year. However, I do think it’s worth closely examining ideas that may have never really had a lot of life from inception. The concept I want to discuss here is the notion of “real time detection” as it relates to detecting the activity of structured threat actors.
I’m not going to get into the semantics of what constitutes “real time” versus “near real time” as that isn’t really the point of this article. I’ll suffice to say that when we talk about real time detection we are referring to the act of investigating alerts (typically generated from some type of IDS or other detection mechanism), and making a decision whether or not something bad has occurred and if escalation to incident response is necessary. This concept relies on event-driven inputs from detection mechanisms to mark the start of the analysis process, and quick decision-making immediately following the receipt of that input and a brief investigation.
With a real time detection and analysis approach there is a tremendous amount of pressure to make decisions rapidly near the beginning of the analysis process. Many security operation centers (SOCs) even track metrics related to the duration of time between an alert being generated and that alert being closed or escalated (often called “dwell time”). It isn’t uncommon for these SOCs to judge the performance of analysts or groups of analysts as a whole based on these types of metrics. The problem with this approach exists in that there are fundamental psychological barriers that are working against the analyst in this model. In order to understand these barriers, we need to examine how the mind works and is influenced.
Limitations of the Mind
The investigation or analysis process is based around cognition, which is the term used to refer to the rate at which humans can bridge the gap between perception and reality. In this sense, perception is a situation as we individually interpret it, and reality is a situation as it actually exists. Cognition can be nearly instant, such as looking at a shirt and recognizing that it is blue. In other situations, like security analysis, cognition can take quite some time. Furthermore, even after a lengthy cognition process an analyst may never fully arrive at a complete version of reality.
The thing that makes everyone different in terms of cognitive psychology is the mindset. A mindset is, essentially, the lens we view the world through. A mindset is something we are born with, and something that is constantly evolving. Any time we perceive something new, it is assimilated into our mindset and affects the next thing we will perceive. While we do have control over certain aspects of our mindset, it is impossible to be aware or in control of every portion of it. This is especially true of the subconscious portions of that mindset that are formed early on in our development. Ultimately, the mindset is a combination of nature and nurture. It takes it account where we were born, where we grew up, the values of our parents, the influence of our friends, life experiences, work experiences, relationships, and even our health.
Figure 1: Our Mindset and Perception are Related
A mindset is a good thing because it is that mindset that allows us to all think differently and be creative in unique ways. In information security, this is what allows some analysts to exceed in very unique areas of our craft. However, the limitation imposed on us because of our mindset results in a few scenarios that negatively affect our perception and ability for rapid cognition.
Humans Perceive What they Expect to Perceive
The expectancy theory states that an individual will decide to behave or act in a certain way because they are motivated to select a specific behavior over other behaviors. While we often think of motivation as something overt and identifiable, that isn’t the case in most situations. Instead, these expectations and patterns of expectations are a product of our mindset, both the conscious and subconscious part of it. As an example, read the text in Figure 2.
Figure 2: An Example of Expected Perception
Now, read the text in Figure 2 again. Did you notice that the article in each of the triangles was repeated? In this example, you probably didn’t because these phrases are common vernacular that you’ve come to expect to be presented a specific way. Beyond that, additional ambiguity was introduced by forming the words in a triangle such that they are interpreted in a manner that is not conducive to spotting the anomaly (but more on that later). The key takeaway here is that we are rarely in control of how we initially perceive something.
Mindsets are Quick to Form but Resistant to Change
If we are not in full control of our mindset, then it is fair to say that we cannot be in full control of our perception. This becomes a problem because aspects of our mindset are quick to form, but resistant to change. Cognitive psychology tells us that we don’t learn by creating a large number of multiple independent concepts, but rather, we form a few base concepts or images, and assimilate new information to those things.
This is why we rarely notice gradual changes such as weight gain in friends and family that we see often. You are very unlikely to notice if a coworker you see every day gains twenty pounds over six months. However, if you see a friend you haven’t seen in six months, you are much more likely to notice the added weight. Even if it isn’t very obvious, you are likely to think “Something looks different about Frank.”
Highly Ambiguous Scenarios are Difficult to Overcome
A scenario that is highly ambiguous is one that is open to multiple interpretations. This is the product of a large number of potential outcomes, but a limited amount of data for which to form a hypothesis of which outcome is most likely. A common experiment that is referenced to prove this relationship is related to interference in visual recognition, or something occasionally referred to as the “blur test.” In this experiment, a subject is exposed to a blurry image that slowly comes into focus until the image becomes clear enough to be identified. The independent variable in this experiment was the initial amount of blur in the image, and the dependent variable was the amount of blur remaining in the image when the subject was able to determine what was being visually represented. The psychologists conducting the experiment presented a set of images to subjects with varying degrees of initial blur, and measured the amount of blur remaining when the was able to identify what the image was.
The results were really interesting, because they showed with statistical significance that when an image with a higher initial amount of blur was presented to a subject, the image had to get much clearer in order for them to identify what it actually was. Conversely, when an image was presented with a lower initial amount of blur, subjects could identify what the image represented much sooner and well before the image had come fully into focus.
The amount of initial blur in this experiment represents a simple example of varying the level of ambiguity in a scenario, which can lead us to infer that higher initial ambiguity can lengthen the amount of time required to bridge the gap between perception and reality.
Applications to Security Investigation
When we consider the nature of the investigative process, we know that it is based on being presented with a varying amount of initial data in a scenario where there are hundreds or thousands of possible outcomes. This yields a situation where a dozen analysts could come up with a dozen initial hypotheses regarding what is actually happening on a network given a single input. Each analyst forms their initial perception, and as they continue to collect more data they assimilate that data to the initial perception.
In a real world scenario where analysts often handle investigations “cradle to grave” or “alert to escalation”, this presents a scenario where the evidence that has been gathered over time is never viewed from a clear perspective that is free from initial perception (and bias). Given that network traffic, malicious binaries, and log data can be incredibly deceiving, this is a limiting factor in the success of an investigation as our initial perception in an investigation is increasingly likely to be wrong. This speaks to the limitation I previously discussed regarding how mindsets are quick to form but resistant to change, and how a large initial amount of ambiguity, which is very common in security investigations, can lead to flawed investigations.
Ultimately, the scenarios in which security investigations are conducted contain many of the characteristics in which the mind is limited in its ability to bridge the gap between perception and reality.
Identifying problems with the approaches we often take with security analysis is quite a bit easier than figuring out how to overcome those challenges. While I don’t think that there is a fix-all nor do I offer to present any panacea-like solutions, I think that we are entering an era where analysis is become an incredibly important part of the security landscape that justifies rethinking some of the ways we approach how we perform security investigations. Focusing on cognitive problems of analysis, I think there are three themes that we, as an industry, can do to improve how we get from alert to resolution. While I don’t think these three things encompass a complete paradigm shift in how alerts are investigated, I do believe that they will be part of it.
Separation of Triage vs. Investigation
While multiple definitions may exist, triage in terms of event-driven security analysis typically refers to the process of initially reviewing an alert to determine if more investigation is required, and what priority that investigation should have relative to other investigations. In the past, the triage process has been treated as a part of the broader investigative process; however they are fundamentally different activities requiring different skill sets. They are also subject to varying types and degrees of biases and cognitive limitations I discussed earlier. Those biases and limitation are often a lesser concern during an initial triage, and of much more concern to investigations that require a larger amount of time to complete.
Faster and less ambiguous analysis scenarios are still subject to bias and other limitation of the human mind to some extent, but real world application tells that there are many scenarios where a quick triage of an event to determine if more investigation is required can often be done on an individual basis. This is as long as that individual is of adequate experience and is using a structured and repeatable technique. That means that it is acceptable for a single human to handle the investigation of things like unstructured threats and relatively simply malware infections. After all, these things are often very clear-cut, and can usually be validated or invalidated by simply comparing network or host data to the signature which generated the alert.
On the other hand, investigations associated with structured threats, complex malware, or that are generally more initially ambiguous require a different and more lengthy approach, which is the scenario I will focus on exclusively in the next two items I will discuss. The key takeaway here is that we should treat triage and investigation as two separate but related processes.
Although a single person can often perform triage-based analysis, this is not the case for more involved investigations. As evidence suggests, the analyst who performs the initial triage of an event is at a disadvantage when it comes to forming an accurate perception of what has really occurred once new data becomes available. Just as the subjects in the “blur test” were less successful in identifying an image when a larger amount of initial blur was present, analysts who are investigating a security event are less likely to identify the full chain of events if they start at a point where only minimal context details are available.
Because cognitive limitations prevent us from efficiently reforming our perceptions when new data becomes available, it makes a case to perform hand-offs to other analysts at specific points during the investigation. Thus, we are shifting the primary investigator of the investigation such that the investigation gradually receives more clarity and narrows the cognition gap that may be present. Determining when these hand-offs should occur is hard to predict since organizational structures can vary. However, at baseline it is reasonable to estimate that a handoff should occur at least after the initial triage. Beyond this, it may make sense for hand-offs to occur at points in time when there is a dramatic influx of new and relevant information, or when the scope of the investigation broadens widely.
This approach creates an interesting byproduct. If all significant investigations are handed off after triage, this essentially creates an analyst who is exclusively focused on alert triage. Considering this is its own workflow requiring a unique set of skills, this can be looked on as a benefit to a graduated approach. While a graduated approach doesn’t necessarily require a graduated skill level in analysts (such as level 1, 2, and 3 analysts), logic would suggest that this might be beneficial from a resource staffing perspective. In this model, only more skilled analysts are examining “higher tier” investigations encompassing a great deal more data. On the other hand, some might suggest that the triage analyst should be one of the more skilled analysts, as they will be defining additional data points for collection over time that will shape the course of the investigation. There does not yet exist enough data to determine which approach yields the greatest benefit.
“Realistic Time Detection” in Favor of “Real Time Detection”
The nature of traditional analysis dictates that an analysts is presented with some input data and is asked to make a rapid decision whether or not a breach has occurred, and to what extent. I believe that the immense pressure to make a quick and final decision is not based on the needs of the situation at hand, but rather, the unrealistic expectation we have placed on the role of the analyst. It is logically unreasonable to expect to detect and ascertain all of the pertinent details of a potential compromise in any kind of manner that resembles real time or near real time. Even if you are able to determine that malware has been installed and C2 communication is present, you still don’t know how the attacker got in, what other machines they are interacting with, the nature of the attacker (structured or unstructured), or if an attack if ongoing.
Research has shown that the average attacker spends 244 days on a network. With that large time range working against us, it is not entirely reasonable to shoot for detecting and explaining the presence of an attacker in anything resembling real time. Most individuals who have researched structured attackers or have pretended to be them will tell you that these information campaigns are focused on objectives that require quite a bit of effort to find the data that is desired and also require a long-term persistent campaign in order to continually collect data and achieve the campaign goals. Thus, detecting, containing, and extricating an attacker from your network at day 15 isn’t horribly ineffective. It isn’t as ideal, but we are dealing with circumstances that sometimes call for less than ideal solutions. Ultimately, I would rather focus on strategic “realistic time” detection and catch an adversary on day +12 rather than focus on “real time” detection and miss an adversary on day 0 due to a flawed investigative approach, only to be notified by a third party that the attacker has been in my network for quite some time on day +200.
Focusing on a slower more methodical approach to analysis isn’t easy, and to be honest, I don’t clam to know what that whole picture looks like. I can deduce that it does contain some of the following characteristics, in addition to the notions of segregated triage and graduated analysis mentioned above:
- Case Emphasis – The investigative process should be treated not unlike a medical case. First, symptoms are evaluated. Next, data is gathered and tests are performed. Finally, observations are made over time that are driven by desired data points. These things build until a conclusion can be made, and may take quite some time. A lack of data doesn’t warrant ignoring symptoms, but rather, a deeper focus on collecting new data.
- Analytic Technique – Analysts should be able to identify with multiple types of analytic techniques that are well suited to their strengths and weaknesses, and also to different scenarios. There has been little study into this area, but we have a lot to learn from other fields here with techniques like relation investigation and differential diagnosis.
- Analysis as a Study of Change – While traditional investigations focus almost exclusively on attempting to correlate multiple static data points, this needs to also include a focus on changes in anticipated behavior. This involves taking a baseline followed by multiple additional measurements at later points in time, and then comparing those measurements. This is a foundational approach that is practiced frequently in many types of scientific analysis. While some may confuse this with “anomaly-based detection”, this is a different concept more closely associated with “anomaly-based analysis”. Currently, the industry has a lack of technology that supports this and other aspects of friendly intelligence collection.
- Post-Exploitation Focus – The industry tends to focus dramatically on the initial exploitation and command and control function of an attack life cycle. We do this because it supports a real time detection model, but if we are truly to focus on realistic time detection and the study of change, we must focus on things that can more easily be measured when compared to normal behavior and communication sequences. This lends itself more towards focusing on post-exploitation activities more closely tied to attackers actions on objectives.
The thoughts presented here are hardly conclusive, but they are founded in scientific study that I think warrants some time of change. While I’ve suggested some major shifts that I think need to take place in order to shore up some of the deficiencies in cognition, these are merely some broad ideas that I’ll be the first to admit haven’t been fully and completely thought out or tested. My hope is that this article will serve to raise more questions, as these are concepts I’ll continue to be pursuing in my own research of investigative techniques and cognitive psychology.
- Study: Interference in Visual Recognition: http://cvcl.mit.edu/SUNSeminar/brunerpotter1964.pdf
- Study: M-Trends 2014: https://www.mandiant.com/blog/mtrends-2014-threat-report-revealed/
- Study: On the Perception of Incongruity: A Paradigm http://beyond-belief.org.uk/sites/beyond-belief.org.uk/files/On%20the%20Perception%20of%20Incongruity.pdf
- Article: Cognitive Factors in Deception and Counterdeception (Daniel and Herbig, 1982)
- Book: Psychology(David Myers, 2006)
- Book: Cognitive Psychology: Connecting Mind, Research and Everyday Experience (E. Bruce Goldstein, 2010)
- Book: Psychology of Intelligence Analysis (Richards J. Heuer, 1999)
- Book: Managing Strategic Surprise (Bracken, et al., 2008)
Excellent insight, Chris! Many years after your post this keeps having totally sense. Dwell time had consistently drop from 2014 to now, but we still have dozens of days to detect important things that we may miss doing a flustered analysis based on unrealistic SLAs, the separation of triage and investigation is crucial for that. Eight years later, do you believe this is still truth based on your last research?
Thanks! The thing we’re dealing with now that we weren’t dealing with so much when I wrote this article was ransomware. In many cases, we see where the initial compromise to data encryption occurs within a few days or even a few hours as some of these criminal groups are really efficient and don’t care to stick around longer than they have to. That makes time more of a concern than with other sorts of breaches that are more persistent and focused on data theft. However, when I talk about realistic time detection and presenting more useful data to analysts a bit later even if it delays it… those delays are still usually only matters of minutes. Basically, I’m saying that I would rather analysts get an alert that’s 10 minutes delayed because some automated processes went out and got other necessary and related data for them to give them a more complete picture of what’s happening. There’s a notion amongst some that analysts need alerts the SECOND they are detected, and I don’t think that’s the case. Analysts will perform better with a clearer initial picture of the situation, and we can still trade minutes for that (but probably not days).