One of the problems we face while trying to detect and respond to adversaries is in the sheer amount of data we have to collect and parse. Twenty years ago it wasn’t as difficult to place multiple sensors in a network, collect packet and log data, and store that data for quite some time. In modern networks, that is becoming less and less feasible. Many others have written about this at length, but I want to highlight two main points.
Attackers play the long game. The average time from breach to discovery is over two hundred days. Despite media jargon about “millions of attacks a day” or attacks happening “at the speed of light”, the true nature of breaches is that they are not speedy endeavors from the attackers side. Gaining a foothold in a network, moving laterally within that network, and strategically locating and retrieving target data can take weeks or months. Structured attackers don’t win when they gain access to a network. They win once they accomplish their objective, which typically comes much later.
Long term storage isn’t economical. While some organizations are able to store PCAP or verbose log data in terms of months, that is typically reserved for incredibly well funded organizations or the gov/mil, and is becoming less common. Even on smaller networks, most can only store this data in terms of hours, or at most a few days. I typically only see long term storage for aggregate data (like flow data) or statistical data. The amount of data we generate has dramatically outgrown our capability to store and parse through that data, and this issue it only going to worsen for security purposes.
Medicine and Prospective Collection
The problem of having far too much data to collect and analyze is not unique to our domain. As I often do, let’s look towards the medical field. While the mechanics are a lot different, medical practitioners rely on a lot of the same cognitive skills to investigate afflictions to the human condition that we do to investigate afflictions to our networks. These are things like fluid ability, working memory, and source monitoring accuracy all work in the same ways to help practitioners get from a disparate set of symptoms to an underlying diagnosis, and hopefully, remediation.
Consider a doctor treating a patient experiencing undesirable symptoms. Most of the time a doctor can’t look back at the evolution of a persons health over time. They can’t take a CAT scan on a brain as it was six months ago. They can’t do an ultrasound on a pancreas as it was two weeks ago. For the most part, they have to take what they have in front of them now or what tests can tell them from very recent history.
If what is available in the short term isn’t enough to make a diagnosis, the physician can determine criteria for what data they want to observe and collect next. They can’t perform constant CAT scans, ultrasounds, or blood tests that look for everything. So, they apply their skills and define the data points they need to make decisions regarding the symptoms and the underlying condition they believe they are dealing with. This might include something like a blood test every day looking at white blood cell counts, continual EKG readings looking for cardiac anomalies, or twice daily neurological response tests. Medical tests are expensive and the amount of data can easily be overwhelming for the diagnostic process. Thus, selectively collecting data needed to support a hypothesis is employed. Physicians call this a clinical test-based approach, but I like to conceptualize it as prospective data collection. While retrospective data looks at things that have previously been collected up until a point in time, prospective data collections rely on specific criteria for what data should be collected moving forward from a fixed point in time, for a set duration. Physicians use a clinical strategy with a predominate lean towards effective use of prospective data collection because they can’t feasibly collect enough retrospective data to meet their needs. Sound familiar?
Investigating Security Incidents Clinically
As security investigators, we typically use a model based solely on past observations and retrospective data analysis. The prospective collection model is rarely leveraged, which is surprising since our field shares many similarities with medicine. We all have the same data problems, and we can all use the same clinical approach.
The symptoms our patients report are alerts. We can’t go back and look at snapshots of a devices health over the retrospective long-term because we can’t feasibly store that data. We can look back in the near term and find certain data points based on those observations, but that is severely time limited. We can also generate a potential diagnosis and observe more symptoms to find and treat the underlying cause of what is happening on our networks.
Let’s look at a scenario using this approach.
An alert is generated for a host (System A). The symptom is that multiple failed login attempts where made on the devices administrator account from another internal system (System B).
The examining analyst performs an initial triage and comes up with a list of potential diagnoses. He attempts to validate or invalidate each diagnosis by examining the retrospective data that is on hand, but is unable to find any concrete evidence that a compromise has occurred. The analyst determines that System B was never able to successfully login to System A, and finds no other indication of malicious activity in the logs. More analysis is warranted, but no other data exists yet. In other scenarios, the investigation might stop here barring any other alerting.
The analyst adds his notes to the investigation and prunes his list of diagnoses to a few plausible candidates. Using these hypothesis diagnoses as a guide, the analyst generates a list of prospective collection criteria. These might include:
- System A: All successful logins, newly created user accounts, flow data to/from System B.
- System B: File downloads, attempted logins to other internal machines, websites visited, flow data to/from System A.
This is all immensely useful data in the context of the investigation, but it doesn’t break the bank in terms of storage or processing costs if the organization needs to store the data for a while in relation to this small scope. The analyst tasks these collections to the appropriate sensors or log collection devices.
The prospective collections record the identified data points and deliver them exclusively to the investigation container they are assigned to. The analyst collects these data points for several days, and perhaps refines them or adds new collections as data is analyzed.
The analyst revisits and reviews the details of the investigation and the returned data, and either defines additional or refined collections, or makes a decision regarding a final diagnosis. This could be one of the following:
- System B appears to be compromised and lateral movement to System A was being attempted.
- No other signs of malicious activity were detected, and it was likely an anomaly resulting from a user who lost their password.
In a purely retrospective model the later steps of this investigation might be skipped, and may lead the analyst to miss the ground truth of what is actually occurring. In this case, the analyst plays the long game and is rewarded for it.
Additional Benefits of Prospective Collection
In addition to the benefits of making better use of storage resources, a model that leverages prospective collection has a few other immediate benefits to the investigative process. These include:
Realistic-Time Detection. As I’ve written previously, when the average time from breach to detection is greater than two hundred days, attempting to discover attackers on your network the second they gain access is overly ambitious. For that matter, it doesn’t acknowledge the fact that attackers may already be inside your network. Detection can often its hardest at the time of initial compromise because attackers are typically more stealthy at this point, and because less data exists to indicate they are present on the network. This difficulty can decrease over time as attackers get sloppier and generate more data that can indicate their presence. Catching an attacker +10 days from initial compromise isn’t as sexy as “real time detection”, but it is a lot more realistic. The goal here is to stop them from completing their mission. Prospective collection supports the notion of realistic-time detection.
Cognitive Front-Loading. Research shows us that people are able to solve problems a lot more efficiently when they are aware of concepts surrounding metacognition (thinking about thinking) and are capable of applying that knowledge. This boils down to have an investigative philosophy and a strategy for generating hypotheses and having multiple approaches towards working towards a final conclusion. Using a prospective collection approach forces analysts to form hypotheses early on in the process, promoting the development of metacognition and investigation strategy.
Repeatability and Identified Assumptions. One of the biggest challenges we face is that investigative knowledge is often tacit and great investigators can’t tell others why they are so good at what they do. Defining prospective collection criteria provides insight towards what great investigators are thinking, and that can be codified and shared with less experienced analysts to increase their abilities. This also allows for more clear identification of assumptions so those can be challenged using structured analytic techniques common in both medicine and intelligence analysis. I wrote about this some here, and spoke about it last year here.
The purpose of this post isn’t to go out and tell everyone that they should stop storing data and refocus their entire SOC towards a model of prospective collection. Certainly, more research is needed there. As always, I believe there is value in examining the successes and failures of other fields that require the same level of critical thinking that security investigations also require. In this case, I think we have a lot to learn from how medical practitioners manage to get from symptoms to diagnosis while experiencing data collection problems similar to what we deal with. I’m looking forward to more research in this area.