Category Archives: Investigations

Investigations and Prospective Data Collection

confused-winnerOne of the problems we face while trying to detect and respond to adversaries is in the sheer amount of data we have to collect and parse. Twenty years ago it wasn’t as difficult to place multiple sensors in a network, collect packet and log data, and store that data for quite some time. In modern networks, that is becoming less and less feasible. Many others have written about this at length, but I want to highlight two main points.

Attackers play the long game. The average time from breach to discovery is over two hundred days. Despite media jargon about “millions of attacks a day” or attacks happening “at the speed of light”, the true nature of breaches is that they are not speedy endeavors from the attackers side. Gaining a foothold in a network, moving laterally within that network, and strategically locating and retrieving target data can take weeks or months. Structured attackers don’t win when they gain access to a network. They win once they accomplish their objective, which typically comes much later.

Long term storage isn’t economical. While some organizations are able to store PCAP or verbose log data in terms of months, that is typically reserved for incredibly well funded organizations or the gov/mil, and is becoming less common. Even on smaller networks, most can only store this data in terms of hours, or at most a few days. I typically only see long term storage for aggregate data (like flow data) or statistical data. The amount of data we generate has dramatically outgrown our capability to store and parse through that data, and this issue it only going to worsen for security purposes.

Medicine and Prospective Collection

The problem of having far too much data to collect and analyze is not unique to our domain. As I often do, let’s look towards the medical field. While the mechanics are a lot different, medical practitioners rely on a lot of the same cognitive skills to investigate afflictions to the human condition that we do to investigate afflictions to our networks. These are things like fluid ability, working memory, and source monitoring accuracy all work in the same ways to help practitioners get from a disparate set of symptoms to an underlying diagnosis, and hopefully, remediation.

Consider a doctor treating a patient experiencing undesirable symptoms. Most of the time a doctor can’t look back at the evolution of a persons health over time. They can’t take a CAT scan on a brain as it was six months ago. They can’t do an ultrasound on a pancreas as it was two weeks ago. For the most part, they have to take what they have in front of them now or what tests can tell them from very recent history.

If what is available in the short term isn’t enough to make a diagnosis, the physician can determine criteria for what data they want to observe and collect next. They can’t perform constant CAT scans, ultrasounds, or blood tests that look for everything. So, they apply their skills and define the data points they need to make decisions regarding the symptoms and the underlying condition they believe they are dealing with. This might include something like a blood test every day looking at white blood cell counts, continual EKG readings looking for cardiac anomalies, or twice daily neurological response tests. Medical tests are expensive and the amount of data can easily be overwhelming for the diagnostic process. Thus, selectively collecting data needed to support a hypothesis is employed. Physicians call this a clinical test-based approach, but I like to conceptualize it as prospective data collection. While retrospective data looks at things that have previously been collected up until a point in time, prospective data collections rely on specific criteria for what data should be collected moving forward from a fixed point in time, for a set duration. Physicians use a clinical strategy with a predominate lean towards effective use of prospective data collection because they can’t feasibly collect enough retrospective data to meet their needs. Sound familiar?

Investigating Security Incidents Clinically

As security investigators, we typically use a model based solely on past observations and retrospective data analysis. The prospective collection model is rarely leveraged, which is surprising since our field shares many similarities with medicine. We all have the same data problems, and we can all use the same clinical approach.

The symptoms our patients report are alerts. We can’t go back and look at snapshots of a devices health over the retrospective long-term because we can’t feasibly store that data. We can look back in the near term and find certain data points based on those observations, but that is severely time limited. We can also generate a potential diagnosis and observe more symptoms to find and treat the underlying cause of what is happening on our networks.

Let’s look at a scenario using this approach.

Step 1

An alert is generated for a host (System A). The symptom is that multiple failed login attempts where made on the devices administrator account from another internal system (System B). 

Step 2

The examining analyst performs an initial triage and comes up with a list of potential diagnoses. He attempts to validate or invalidate each diagnosis by examining the retrospective data that is on hand, but is unable to find any concrete evidence that a compromise has occurred. The analyst determines that System B was never able to successfully login to System A, and finds no other indication of malicious activity in the logs. More analysis is warranted, but no other data exists yet. In other scenarios, the investigation might stop here barring any other alerting. 

Step 3

The analyst adds his notes to the investigation and prunes his list of diagnoses to a few plausible candidates. Using these hypothesis diagnoses as a guide, the analyst generates a list of prospective collection criteria. These might include:

  • System A: All successful logins, newly created user accounts, flow data to/from System B.
  • System B: File downloads, attempted logins to other internal machines, websites visited, flow data to/from System A.

This is all immensely useful data in the context of the investigation, but it doesn’t break the bank in terms of storage or processing costs if the organization needs to store the data for a while in relation to this small scope. The analyst tasks these collections to the appropriate sensors or log collection devices. 

Step 4

The prospective collections record the identified data points and deliver them exclusively to the investigation container they are assigned to. The analyst collects these data points for several days, and perhaps refines them or adds new collections as data is analyzed.

Step 5

The analyst revisits and reviews the details of the investigation and the returned data, and either defines additional or refined collections, or makes a decision regarding a final diagnosis. This could be one of the following:

  • System B appears to be compromised and lateral movement to System A was being attempted.
  • No other signs of malicious activity were detected, and it was likely an anomaly resulting from a user who lost their password. 

In a purely retrospective model the later steps of this investigation might be skipped, and may lead the analyst to miss the ground truth of what is actually occurring. In this case, the analyst plays the long game and is rewarded for it.

Additional Benefits of Prospective Collection

In addition to the benefits of making better use of storage resources, a model that leverages prospective collection has a few other immediate benefits to the investigative process. These include:

Realistic-Time Detection. As I’ve written previously, when the average time from breach to detection is greater than two hundred days, attempting to discover attackers on your network the second they gain access is overly ambitious. For that matter, it doesn’t acknowledge the fact that attackers may already be inside your network. Detection can often its hardest at the time of initial compromise because attackers are typically more stealthy at this point, and because less data exists to indicate they are present on the network. This difficulty can decrease over time as attackers get sloppier and generate more data that can indicate their presence. Catching an attacker +10 days from initial compromise isn’t as sexy as “real time detection”, but it is a lot more realistic. The goal here is to stop them from completing their mission. Prospective collection supports the notion of realistic-time detection.

Cognitive Front-Loading. Research shows us that people are able to solve problems a lot more efficiently when they are aware of concepts surrounding metacognition (thinking about thinking) and are capable of applying that knowledge. This boils down to have an investigative philosophy and a strategy for generating hypotheses and having multiple approaches towards working towards a final conclusion. Using a prospective collection approach forces analysts to form hypotheses early on in the process, promoting the development of metacognition and investigation strategy.

Repeatability and Identified Assumptions. One of the biggest challenges we face is that investigative knowledge is often tacit and great investigators can’t tell others why they are so good at what they do. Defining prospective collection criteria provides insight towards what great investigators are thinking, and that can be codified and shared with less experienced analysts to increase their abilities. This also allows for more clear identification of assumptions so those can be challenged using structured analytic techniques common in both medicine and intelligence analysis. I wrote about this some here, and spoke about it last year here.

Conclusion

The purpose of this post isn’t to go out and tell everyone that they should stop storing data and refocus their entire SOC towards a model of prospective collection. Certainly, more research is needed there. As always, I believe there is value in examining the successes and failures of other fields that require the same level of critical thinking that security investigations also require. In this case, I think we have a lot to learn from how medical practitioners manage to get from symptoms to diagnosis while experiencing data collection problems similar to what we deal with. I’m looking forward to more research in this area.

On the Importance of Questions in an Investigation

questionsI spend a large part of my day studying cognition related to security investigations, which can ultimately be boiled down to thinking about how we learn and process information during and around our investigative processes. As part of my research, one of my professors recently pointed me towards a TEDx video by Dan Rothstein entitled “Did Socrates Get it Wrong?”. In this fourteen minute talk Rothstein questions whether Socrates approach of expert led questioning, commonly referred to as the Socratic method, was wrong. He brings up quite a few fascinating points, but ultimately concludes that Socrates was right and wrong, and that strategic questioning is of the utmost importance, but that it can also be an entirely student lead exercise. The key here is that asking the right question is critical for exploration, and of course, getting to the right answer.

This has quite a few implications to security investigations. Strategic questioning as a means towards finding and eliminating bias is something that immediately comes to mind, but not what I want to talk about here.

At a more fundamental level is questioning as the essence of the investigation process. I tend to believe that an investigation itself is simply a question. Usually something like this:

  • What happened here?
  • Did we get compromised?
  • Did APT[x] access any of our information assets?

Going one step further, I would also hypothesize that every action we take during the course of an investigation can be distilled down into a question, like these:

  • Does the activity identified in this alert match what the signature was trying to detect?
  • Did internal Host A communicate with external Host B?
  • Did the device download and execute the stage two payload of this malware family?
  • Is there a log indicating that a specific file was accessed?

Most of the time these questions don’t materialize in this form. Typically, they develop in our subconscious and analysts go forth looking for answers before they’ve articulated the question fully. I may not actually ask myself “Does the data in this PCAP match what the signature was looking for in the appropriate context?” before I go look at the signature to see what it was attempting to detect, but subconsciously that is exactly what I’m doing. Research suggests that a lot of this can be attributed to the formulation of habits or intuition (potentially in a brain structure known as the precuneus) that help us be more cognitively efficient. While this type of intuition can help us get things done faster, there is immense value in ripping these things from our subconscious into our conscious thought so that they can be articulated.

A couple things come to mind immediately when assessing the value of articulating questions consciously. First, if all of an investigation can be based on questions, we must ensure we are asking the right questions. This requires us to be consciously aware of those questions before we seek to solve them. Second, if we hope to successfully train the next generation of analysts then we have to teach them to ask the right questions, again requiring us to be consciously aware of what they are.

If you are a security investigator or are responsible for training them, consider creating a culture of articulated questions in your SOC. Before acting, attempt to determine what question you are trying to answer and share that information with your peers. I would bet that you will find this type of strategic questioning will help you ask better questions and more effectively guide your investigation towards an appropriate goal.

References:

Dan Rothstein, “Did Socrates Get it Wrong”, TEDx Somerville – https://www.youtube.com/watch?v=_JdczdsYBNA 

Working Memory and the Visual Investigative Hypothesis

Late last year I wrote a blog post focused what I have perceived as a coming evolution of focus for security investigations. This evolution will push us into an era where the human analyst takes center stage in a security investigation, and where tools and processes will shift to augment human cognitive ability. In this article, I want to expand on some of those thoughts and describe my research on how human analysts solve investigations. This is summarized as a concept I refer to as visual investigative theory.

I want to begin by revisiting the KSU ethnographic study I called out in a previous article. When several KSU sociologists spent time performing an ethnographic study of a security operations center they had some very interesting findings. Based on those findings, I drew the following conclusions:

Investigative process knowledge is tacit. While experienced analysts have the ability to quickly solve investigations, they almost never have the ability to accurately describe what makes them so effective.

Fundamental skills and domains aren’t well established. We have an inability to identify the fundamental cognitive (not platform or technology specific) skills that are required to successfully detect and response to compromises. Further, we have not clearly identified subdomains of the broader security investigation domain, and differentiated the cognitive skills necessary to define and excel at each of them.

Knowledge transfer is limited. Without identified skills and domains, or adequate explicit process knowledge, our ability to train less experienced analysts is hampered. Most SOC’s rely exclusively on “over the shoulder” training where less experienced investigators simply watch experienced investigators work. While this has its place, a training program founded exclusively in this type of instruction is fundamentally flawed and lacks proper fundamental building blocks.

Investigations rely on intuition. The aforementioned findings lead to the conclusion that the investigative process relies heavily on intuition. Beyond tool and technology specific processes, investigators rely almost exclusively on what they might refer to as “gut feeling” to determine which steps they should take to connect the dots and solve the investigation at hand.

Examining Intuition

Intuition typically refers to the ability to understand something immediately without the need for conscious reasoning. The concept of intuition isn’t new, but its acceptance in the world of psychological research is. Psychology itself is a fairly young field, having only existed since the late 1800s and becoming exponentially more popular around the mid 1900s. Most founding fathers of psychology dismissed intuition. Even Sigmund Freud was famous for saying that “it is an illusion to expect anything from intuition.” However, that has changed in recent years with the development of more sophisticated brain imaging techniques.

If you’ve ever had a head injury where you’ve scrambled your eggs a bit, then there is a chance that you’ve been the beneficiary of an MRI scan. A newer and more advanced form of this is something called an fMRI scan, which allows doctors and researchers to measure the response level from certain areas of the brain when specific stimuli are introduced.

A group of researchers recently wanted to better understand the science behind intuition. To do this, they utilized fMRI technology to measure the response of different areas of the brain while presenting chess of varying degrees of expertise with match scenarios designed to draw upon their sense of intuition. While chess is very different from investigating security incidents, participants in each of these tasks claim to be successful thanks in part to unexplainable, tacit intuition.

In this scenario, researches selected two groups of chess players. The first group consisted of journeyman chess players who were familiar with the game, but would not be considered professionals or experts. The second group consisted of professional chess players with high global rankings. Both groups were presented with an image of a chessboard showing a game in progress for a short period of time. They were then asked questions relating to what moves they thought would be best next, while their neural response was measured using fMRI technology.

The results of this experiment were exciting because they identified a specific area of the brain where the chess experts showed significantly more activity than the inexperienced players. This area, called the precuneus, showed 2.1x more activity in the chess experts. This indicates that there is a biological basis for the unconscious thought that we’ve previously only been able to refer to as intuition. Because of this, many psychologists have begun to shift their beliefs such that they recognize the existence of intuition.

WorkingMemory-Fig1

Figure 1: The precuneus is related to what we think of as intuition

This gets really interesting when you consider that the precuneus is also known to be responsible for portions of our working memory, and our capacity to form and manipulate mental images. Before we dive into that, let’s have a quick primer on how human memory works.

Modeling Memory

There are multiple theories and models related to how memory is organized, but the most widely accepted model breaks it down into three distinct categories.

Sensory Information Store (SIS) is the most volatile form of memory, and is associated with the lingering sensations that follow a stimulus. For instance, if you are starting at an object and close your eyes, you may still “see” the object for a brief period as though its printed on to the inside of your eyelids. This is an example of SIS.

Short-term Memory (STM) is volatile memory that exists in conscious thought. When you are actively thinking about something, you are using STM to do so. This is why STM is often referred to as working memory (WM). Things that we perceive and only contemplate for a short period of time that aren’t worthy of storing permanently are processed by STM. In computing terms, STM is akin to RAM.

Long-term Memory (LTM) is our most resilient form of memory. Once something gets encoded into LTM it is stored for a very long time. For input into LTM, some theorize that we only encode certain things into LTM while others propose that we encode most everything. For output from LTM, some propose that we store everything but simply can’t recall it all, while others propose that some things that are encoded eventually decay out over time. In computer terms, LTM is similar to the concept of disk storage.

For the purposes of this article, we are most concerned with short term / working memory. As with memory in general, there are multiple models for how STM is organized, but one of the most widely accepted is Baddeley’s Model of Working Memory.

WorkingMemory-Fig2

Figure 2: Badelley’s model of working memory

In Badelley’s model, there are three components of WM that are all controlled by a central executive services.

The Phonological Loop stores audible information and prevents it from decaying by continuously repeating its contents. For example, it allows you to use working memory to remember a phone number by repeating it over and over again in your head.

The Episodic Buffer holds representations that integrate multiple types of information to form a single unified representation of memory. It was a more later and more recent addition to the model.

The Visuospatial Sketchpad (VSSP) allows us to mentally picture and manipulate visual information about objects. For example, if you picture a multi-colored cube rotating so that different colors face you as time advances, you are using the VSSP. It is this portion of working memory we are most concerned about for the purpose of this discussion.

Visual Investigative Hypothesis

We can apply what we just learned about working memory to the earlier discussion about intuition. As we discovered, intuition is strong related to the precuneus. Examination of other psychology and neurology research tells us that the precuneus is involved in several different things, including (surprise!) working memory and visuospatial processing. While not definitive, this does lead us to believe that the visuospatial sketchpad and the mental visualization and manipulation of objects may be related to intuition and how humans solve complex problems.

Of course, I’m not a neuroscientist and there is still quite a bit of ongoing research here. However, I think there are many cases when this theory makes sense. For example, prolific and prodigal musicians have been known to say that they can literally “see” the music as they are composing or playing it. Individuals who practice stock trading will also speak about how they can see trends forming before they actually happen, allowing them to execute smart orders and make a sizable profit. Even going back to our earlier discussion of chess, expert chess players will state that a reason they excel at competition is their ability to “see” the board and picture future situations better than their opponents.

It would truly appear that humans excel at processing information when it’s possible for them to visualize it, so why wouldn’t the same apply to security investigations? I’ve been an analyst myself for quite some time, and I’ve also had the pleasure of working with and speaking to a lot of other analysts, and I think this does apply. It’s important to realize that in a lot of cases, people may visualize things like this subconsciously without actually realizing that they are solving problems visually. I believe that individuals who excel at solving information security investigations also solve problems visually. In fact, I think that many subconsciously see a data or attackers moving thorough a network as they assimilate various data points from system logs, packet captures, and IDS alerts. I’ve summed this theory up into something I call visual investigative hypothesis.

In short, the visual investigative hypothesis states that security analysts are more efficient, and more likely to arrive at a conclusion based on an accurate representation of events that occurred when they are able to visualize the relationships that represent a network compromise and build a mental picture of an attacker moving through a network.

In psychology, most principals exist as either hypothesis or theories because our understanding of the brain, while advancing, is still very limited. Many highly probable concepts and others considerably less probable will likely never advance to being considered confirmed truths, so while I do expect to mold my research into a more sound theory, I certainly don’t expect to ever definitively and quantifiably prove it as a ground truth. A great deal of my doctoral coursework will be geared towards development of visual investigative hypothesis into more formal theory, which will involve continued efforts interviewing security analyst and conducting case studies regarding their investigative habits, failures, and successes.

Maximizing Working Memory Effectiveness

While there is still much work to do, if you subscribe to the visual investigative hypothesis there are a few ways you can begin shifting your investigative technique towards something that is much more visual. When considering working memory, its important to understand that it is a finite resource. Humans only have so much capacity in working memory, just like computers have only so much RAM. Some people have a larger WM capacity while some have less. In addition, external factors like tiredness and stress can negatively affect the situational capacity of WM. Knowing WM is a finite resource can guide us towards ideas for optimizing our investigative habits and the tools we use to perform our work.

As an example, consider the magic number seven, a theory developed by Princeton psychologist George Miller. This theory states that an average person can hold seven objects in working memory, plus or minus two. This means that if I were to list twenty random objects, you are likely to only remember five to nine of them. This is the result of biology, and most likely something that can’t really be changed person to person.

This applies to the investigative process when you think about all of the various pieces of information that an analyst has to store in WM when attempting to describe an anomalous event or breach. At any given point an analyst might need to consider a pair of IP addresses, a port number, protocol, two system roles, a detection signature, a file name, a portion of a file hash, a system name, a start time, and an end time. No wonder investigations push the limits of WM capacity.

Overcoming magic number seven and limitations of working memory is all about making the right information available at the right time, and in the right way. Some ways that we can do this during an investigation include:

Data Scoping: Analyst should only retrieve the information they need for the time duration required. Have too little data is a bad thing, but having too much data can be just as bad. This can be achieved by formulating concise questions before seeking data, and making sure your data sources can be queried flexibly.

Focusing on Relationships: Humans remember things better if they can associate them with existing schemas in long-term memory. If I were to tell you ten random objects and ask you to recall them an hour later, you would have trouble doing so. If I repeated the same experiment with related items like breakfast foods, your recall would be much better. We can force objects in an investigation into similar schemas by describing entities as nouns and their interactions as verb, building graph/link representations that help us conceptualize a potential attackers movement through a network this way. One of the bigger gaps between network attackers and defenders is that attackers often think in this type of relationship-centric manner, and defenders don’t.

Rethinking Search: Searching through data itself should be less of an iterative process of querying a data source, viewing a response, and repeating. It should be more of an exploration where the analyst anchors themselves to a point in the data and they explore outward from there. This supports a relationship-centric view of security.

Visualizing Events over Time: The activities of a suspected adversary typically lend themselves well to groupings of major and minor events occurring at specific times. Using timelines to represent these groupings of events with pointers back to the source data can provide a visual construct that is useful for easing pressure on WM.

Easy to Remember Names: Long strings of characters like MD5/SHA1 hashes or even IP addresses take up valuable space in WM. Often times analysts will try to remember sections of these objects just as the last octet of an IP address or the last few characters of a file has. Another strategy here is to assign common names to various unique hosts and files for quick reference during the investigation. I’ve done this with animals or food in the past. Thus, f527fe6879ae8bf31cbb1e5c32d0fc33 becomes Fennel, and 123.1.2.3 becomes Puma. This is made easier when the tools used facilitate it. Of course, protocols like DNS can make this easier too, but its important to remember that a DNS name simply represents a point to a host, and not a host in itself.

Conclusion

The concepts surrounding the visual investigative hypothesis aren’t new. Most of us know that the right visualizations can help us find evil better, but beyond that we don’t collectively have a lot of solid science that we can use to apply it to security investigations or how we train analysts. While I think there are some practical takeaways we can draw from this immediately, there is still much work to be done. I’m looking forward to continuing my research here and applying cognitive psychology concepts to the security investigation process.

Investigating Like a Chef

Whenever I get the chance I like to try and extract lessons from practitioners in other fields. This is important because the discipline of information security is so new, while more established professions have been around, in some cases, for hundreds of years. I’ve always had a keen interest in culinary studies, mostly because I come from an area of the country where people show that they love each other by preparing meals. I’m also a bit of a BBQ connoisseur myself, as those of you who know me can solemnly attest to. While trying to enhance my BBQ craft I’ve had the opportunity to speak with and read about a few professional chefs and study how they operate. In this post I want to talk a little bit about some key lessons I took away from my observations.

If you have ever worked in food service, or have even prepared a meal for a large number of people you know that repetition is often the name of the game. It’s not trimming one rack of ribs, its trimming a dozen of them. It’s not cutting one sweet potato, its cutting a sack of them. Good chefs strive to do these things in large quantities while still maintaining enough attention to detail so that the finished product comes out pristine. There are a lot of things that go into making this happen, but none more important than a chef mastering their environment. This isn’t too different than a security analyst who investigates hundreds of alerts per day while striving to pay an appropriate amount of attention to each individual investigation. Let’s talk about how chefs master their environment and how these concepts can be applied to information security.

Chefs minimize their body movement. If you are going to be standing up in a kitchen all day performing a bunch of repetitive and time sensitive tasks, then you want to make sure every step or movement you make isn’t wasted. This prevents fatigue and increases efficiency.

As an example, take a look at Figure 1. In this image, you will see that everything the chef needs to prepare their dish is readily available without the chef having to take extra steps or turn around too often. Raw products can be moved from the grocery area, rinsed in the sink, sliced or cut on the cutting board, cooked on the stove, and plated without having to turn more than a few times or move more than a couple of feet.

chef_mentalmoves

Figure 1: A Chef’s Workspace is Optimized for Minimal Movement

Chefs learn the French phrase “mise en place” early on in their careers. This statement literally means, “put in place”, but it specifically refers to organizing and arranging all needed ingredients and tools required to prepare menu items during food service. Many culinary instructors will state that proper mise en place, or simply “mise” in shorthand, is the most important characteristic that separates a professional chef from a home cook.

There is a lot of room for mise in security investigations as well. Most analysts already practice this to some degree by making sure that their operating system is configured to their liking. They have their terminal windows configured with a font and colors the make it easy to read, they have common OSINT research sites readily accessible as browser favorites, and they have shortcut icons to all of their commonly used tools. At a higher level, some analysts even have custom scripts and tools they’ve written to minimize repetitive tasks. These things are highly encouraged.

While analysts don’t have to worry about physical movement as much, they do have to work about mental movement. In an ideal situation an analyst can get to the end of an investigation with as few steps as possible, and a strategic organization of their digital workspace can help facilitate that. I’ve seen some organizations that seek to limit the flexibility analysts have in their workspace by enforcing consistent desktop environments or limiting access to additional tools. While policies to enforce good security and analysis practices are great, every analysts learns and processes information in a different way. It isn’t only encouraged that analysts have flexibility to configure their own operating environments, it’s critical to helping them achieve success.

Beyond the individual analysts workstation, the organization can also help out by providing easy access to tool and data, and processes that support it. If an analyst has to connect to five systems to retrieve the same data, that is too much mental movement that could be better spent formulating and answering questions about the investigation. Furthermore, if organizations limit access to raw data it could force the analyst to make additional mental moves that slow down their progress.

Chefs make minimal trips to the fridge/pantry. When you are cooking dinner at home you likely make multiple trips to the fridge to get ingredients or to the pantry to retrieve spices during the course of your meal. That might look something like this:

“I think this soup needs a bit more tarragon, let me go get it. “

or…

“I forgot I need to add an egg to the carbonara at the end, I’ll go get it from the fridge.”

Building on the concept of mise en place, professional chefs minimize their trips to the fridge and pantry so that they always have the ingredients they need with as few trips as possible. This ensures they are focused on their task, and also minimizes prep and clean up time. They also ensure that they get an appropriate amount of each ingredient to minimize space, clean up, and waste.

chef_mise

Figure 2: Chef’s Gather and Lay Out Ingredients for Multiple Dishes – Mise en Place

One of the most common tasks an analyst will perform during an investigation is retrieval of data in an attempt to answering questions. This might include querying a NetFlow database, pulling full packet capture data from a sensor, or querying log data in a SIEM.

Inexperienced analysts often make two mistakes. The first is not retrieving enough data to answer their questions. This means that the analyst must continue to query the data source and retrieve more data until they get the answer they are looking for. This is equivalent to a chef not getting enough flour from the pantry when trying to make bread. On the flip side, another common pitfall is retrieving too much data, which is an even bigger problem. In these situations an analyst may not limit the time range of their query appropriately, or simply may not use enough filtering. The result is a mountain of data that takes a significant amount of time to wade through. This is equivalent to a chef walking back from the fridge with 100 eggs when they only intend to make a 3-egg omelet.

Learning how to efficiently query data sources during an investigation is product of asking the right questions, understanding the data you have available, and having the data in a place that is easily accessible and reasonably consolidated. If you can do these things you should be able to ensure you are making less trips back to the pantry.

Chefs carefully select, maintain, and master their tools. Most chefs spend a great deal of time and money purchasing and maintaining their knives. They sharpen their knives before every use, and have them professionally refinished frequently. They also spend a great deal of time practicing different types of cuts. A dull or improperly used knife can result in inconsistently cut food, which can lead to poor presentation and even cause under or overcooked food if multiple pieces of food are cooked together but are sized differently. Of course, this could also lead to you accidentally cutting yourself. These concepts go well beyond knives; a bent whisk can result in clumped batter, and an unreliable broiler can burn food. Chefs have to select, maintain, and master a variety of tools to perform their job.

chef_tools

Figure 3: A Chef’s Travel Kit Provides Well-Cared For Essential Tools

In a security investigation tools certainly aren’t everything, but they are critically important. In order analyze network communication you have to understand the protocols involved at a fundamental level, but you also need tools to sort through them, generate statistics, and work towards decision points. Whether it is a packet analysis tool like Wireshark, a flow data analysis tool like SiLK, or an IDS like Snort, you have to understand how those tools work with your data. The more ambiguity placed between you and raw data, the greater chance for assumptions that could lead to poor decisions. This is why it is critical to understand how to use tools, and how they work.

Caring for tools goes well beyond purchasing hardware and ensuring you have enough servers to crunch data. At an organization level it requires hiring the right number of people in your SOC to help manage the infrastructure. Some organizations attempt to put that burden on the analysts, but this isn’t always scalable and often results in analysts being taken away from their primary duties. This is also the “piling on” of responsibilities that results in analysts getting frustrated and leaving a job.

Beyond this, proper tool selection is important as well. I won’t delve into this too much here, but careful consideration should be given to free and open source tools, as well as the potential for developing in house tools. Enterprise solutions have their place, but that shouldn’t be the default go-to. The best work in information security in most cases is still done at the free and open source level. You should look for tools that support existing processes, and never let a tool alone dictate how you conduct an investigation.

Chefs can cook in any kitchen. When chefs master all of the previously mentioned concepts, it allows them to apply those concepts in any location. If you watch professional cooking competitions, you will see that most chefs come with only their knife kit and are able to master the environment of the kitchen they are cooking in. For example, try watching “Chopped” sometime on Food Network. These chefs are given short time constraints and challenging random ingredients. They organize their workspace, assess their tools, make very few trips to get ingredients, and are able to produce five star quality meals.

chef_chopped

Figure 4: Professional Chef’s Competing in an Unfamiliar Kitchen on Food Network’s Chopped

In security investigations, this is all about understanding the fundamentals. Yes, tools are important as I mentioned earlier, but you won’t always work in an environment that provides the same tools. If you only learn how to use Arcsight then you will only ever be successful in environments that use Arcsight. This is why understanding higher-level investigative processes that are SIEM-independent is necessary. Even at a lower level, understanding a tool like Wireshark is great, but you also need to understand how to work with packets using more fundamental and universal tools like tcpdump, as you may not always have access to a graphical desktop. Taking that step further, you should also understand TCP/IP and network protocols so that you can make better sense of the network data you are analyzing without relying on protocol dissectors. A chef’s fundamental understanding of food and cooking methods allows them to cook successfully in any kitchen. An analyst’s fundamental understanding of systems and networking allows them to investigate in any SOC.

Conclusion

Humans have been cooking food for thousands of years, and have been doing so professionally for much longer than computers have even existed. While the skills needed to be chef are dramatically different than those needed to investigate network breaches, there are certainly lessons to be learned here. Now, if you’ll excuse me, writing this has made me hungry.

* Figures 1-3 are from “The Four-Hour Chef” by Tim Ferriss. One of my favorite books.

Teaching Good Investigation Habits Through Reinforcement

Press_for_food-fullThe biggest responsibility that leaders and senior analysts in a SOC have is to ensure that they are providing an appropriate level of training and mentoring to younger and inexperienced analysts. This is how we better our SOC’s, our profession, and ourselves. One problem that I’ve written about previously relates to the prevalence of tacit knowledge in our industry. The analysts who are really good at performing investigations often can’t describe what makes them so good at it, or what processes they use to achieve their goals. This lack of clarity and repeatability makes it exceedingly difficult to use any teaching method other than having inexperienced analysts learning through direct observation of those who are more experienced. While observation is useful, a training program that relies on it too much is flawed.

In this blog post I want to share some thoughts related to recent research I’ve done on learning methods as part of my study in cognitive psychology. More specifically, I want to talk a bit about one specific way that humans learn and how we might be able to better frame our investigative processes to better the investigation skills of our fellow analysts and ourselves.

Operant Conditioning

When most people think of conditioning they think of Pavlov and how he trained his dogs to learn to salivate at the sound of a tone. That is what is referred to as learning by classical conditioning, but that isn’t what I want to talk about here. In this post, I want to instead focus on a different form of learning called operant conditioning. While classical conditioning is learning that is focused on a stimulus that occurs prior to a response and is associated with involuntary response, operant conditioning is learning that is related to voluntary responses and is achieved through reinforcement or punishment.

An easy example of operant conditioning would be to picture a rat in a box. This box contains a button the rat can push with its body weight, and doing so releases a treat. This is an example of positive reinforcement that allows to rat to learn the associated that pressing the button results in a treat. The relationship is positively reinforced because a positive stimulus is used.

Another type of operant conditioning reinforcement is negative reinforcement. Consider the same rat in a different box with a button. In this box, a mild electrical charge is passed to the rat through the floor of the box. When the rat presses the button, the electrical charge stops for several minutes. In this case, negative reinforcement is being used because it teaches the rat a behavior by removing a negative stimulus. The key takeaway here is that negative reinforcement is still reinforcing a behavior, but in a different way. Some people confuse negative reinforcement with punishment.

Punishment is the opposite of reinforcement because it reduces the probability of a behavior being expressed. Consider the previous scenario with the rat in the electrified room, but instead, the room is only electrified when the rat presses the button. This is an example of a punishment that decreases the likelihood of the rat pressing the button.

Application to Security Investigation

I promise that all of this talk about electrifying rats is going somewhere other than the BBQ pit (I live in the deep south, what did you expect?). Earlier I spoke about the challenge we have because of tacit knowledge. This is made worse in many environments where you have access a mountain of data but have an ambiguous workflow that can allow an input (alert) to be taken down hundreds of potential paths. I believe that you can take advantage of a fundamental construct like operant conditioning to help better your analysts. In order to make this happen, I believe there are three key tasks that must occur.

Identify Unique Investigative Domains

First, you must designate domains that lend themselves to specific cognitive functions and specializations. For instance, triage requires different skills sets and cognitive processes than hunting. Thus, those are two separate domains with different workflows. Furthermore, incident response requires yet another set of skills and cognitive processes, making it a third domain of investigation. Some organizations don’t really distinguish between these domains, but they certainly should. I think there is work to be done to fully establish investigative domains (I expect lots of continued research here on my part), and more importantly, criteria for defining these domains. But at a minimum you can easily pick out a few domains relevant to your SOC, like I’ve mentioned above.

Define Key Workflow Characteristics and Approaches

Once you’ve established domains you can attempt to define their characteristics. This isn’t something you do in an afternoon, but there are a few clear wins. For instance, triage is heavily suited to divergent thinking and differential diagnosis techniques. On the other hand, hunting is equally reliant on convergent and divergent thinking and is well suited to relational (link) analysis. These are characteristics you can key on in your workflows moving on to the next step.

Apply Positive and Negative Reinforcement in Tools and Processes

Once you know what paths you want analysts to take, how do you reinforce their learning so that they are compelled to do so? While some of us would like to consider a mechanism that provides punishment via electrified keyboards, positive and negative reinforcement are a bit more appropriate. Of course, you can’t give an analyst a treat when they make good decisions, but you can provide reinforcement in other ways.

For an investigation, there is no better positive stimulus than providing easy and immediate access to relevant data. When training analysts, you want to ensure they are smart about what data they gather to support their questioning. Ideally, an analyst only gathers the amount of information the need to get the answer they want. More skilled analysts are able to do this quickly without spending too much time re-querying data sources for more data or whittling excess away from data sets that are too large. Whenever an analyst has a questions and your tool or process helps them answer it in a timely manner, you are positively reinforcing the use of that tool or process. Furthermore, when the answer to that question helps them solve an investigation, you are reinforcing the questions the analyst is putting forth, which helps that analyst learn what questions are most likely to help them achieve results.

Negative reinforcement can be used advantageous here as well. In many cases analysts arrive at points in an investigation where they simply don’t know what questions to ask next. With no questions to ask, the investigation can stall or prematurely end. When chasing a hot lead, this can result in frustration, despair, and hopelessness. If the tools and processes used in your SOC can help facilitate the investigation by helping the analysts determine their next logical set of questions, then that can serve as negative reinforcement by removing the negative stimuli of frustration, despair, and hopelessness. At this point you aren’t only help the analyst further a single investigation, you are once again reinforcing questions that help them learn how to further every subsequent investigation they will conduct.

Other Thoughts

While the previous sections identified some structured approaches you can take towards bettering your analysts, I had a few less structured thoughts I wanted to share in bullet points. These are ways that I think SOC’s can help achieve teaching goals in every day decisions:

  • How can you continually provide positive reinforcement to help analysts learn to make good decisions?
  • If you are making a decision for analysts, let them know. Little things like data normalization and timestamp assumptions can make a difference. Analyst knowledge of these things further help them understand their own data and how we manipulate it for their (hopeful) betterment. Less abstraction from data is critical to understanding the intricacies of complex systems.
  • You must be aware of when you punish your analysts. This occurs when a tool or process prevents the user from getting data they need, takes liberties with data, fails to produce consistent results, etc. If a process or tool is frustrating for a user, then that punishment decreases the likelihood that they will use it, even if it represents a good step in the investigation. You want to at all costs avoid tools and processes that steer your analysts away from good analytic practices.

Conclusion

This is another post that is pretty heavy in theory, but it isn’t so far away from reality that it doesn’t’ have the potential for real impact in the way you make decisions about the processes and tools used in your SOC, and how you train your analysts. As our industry continues to work on developing workflows and technologies we have to think beyond what looks good and what feels right and grasp the underlying cognitive processes that are occurring and the mental challenges we want to help solve. One method for doing this is a thoughtful use of operating condition as a teaching tool.