Information Security Incident Morbidity and Mortality (M&M)

It may be a bit cliché, but encouraging the team dynamic within an information security group ensures mutual success over individual success. There are a lot of ways to do this, including items I’ve discussed before such as fostering the development of infosec superstars or encouraging servant leadership. Beyond these things, there is no better way to ensure team success within your group than to create a culture of learning. Creating this type of culture goes well beyond sending analysts to formalized courses or paying for certifications. It relies upon adopting the mindset that in every action an analyst takes, they should either be teaching or learning, with no exceptions. Once every analyst begins seeing every part of their daily job as an opportunity to learn something new or teach something new to their peers, then a culture of learning is flourishing.

A part of this type of organizational culture is learning from both successes and failures. The practice of Network Security Monitoring (NSM) and Incident Response (IR) are ones that are centered on technical investigations and cases, and when something bad eventually happens, incidents. This is not unlike medicine, which is also focused on medical investigations and patient cases, and when something bad eventually happens, death.

Medical M&M

When death occurs in medicine, it can usually be classified as something that was either avoidable or inevitable from both a patient standpoint and also as it related to the medical care that was provided. Whenever a death is seen as something that may have been prevented or delayed with modifications to the medical care that was provided, the treating physician will often be asked to participate in something called a Morbidity and Mortality Conference, or M&M as they are often referred to casually. In an M&M, the treating physician will present the case from the initial visit, including the presenting symptoms and the patients initial history and physical assessment. This presentation will continue through the diagnostic and treatment steps that were taken all the way through the patient’s eventual death.

The M&M presentation is given to an audience of peers, to include any other physicians who may have participated in the care of the patient in question, as well as physicians who had nothing to do with the patient. The general premise is that these peers will question the treatment process in order to uncover any mistakes that may have been made or processes that could be improved upon.

The ultimate goal of the medical M&M as a team is to learn from any complications or errors, to modify behavior and judgment based upon experiences gained, and to prevent repetition of errors leading to complications. This is something that has occurred within medicine for over one hundred years and has proven to be wildly successful.

Information Security M&M

I’ve written about how information security can learn from the medical field on multiple occasions, including recently discussing the use of Differential Diagnosis for Network Security Monitoring. The concept of M&M is also something that I think transitions very well to information security.

As information security professionals, it is very easy to miss things. I’m a firm believer that prevention eventually fails, and as a result, we can’t be expected to live in a world free from compromise. Rather, we must be positioned so that when an incident does occur, it can be detected and responded to quickly. Once that is done, we can learn from whatever mistakes occurred that allowed the intrusion, and be better prepared the next time.

When an incident occurs we want it to be because of something out of our hands, such as a very sophisticated attacker or an attacker who is using an unknown zero day. The truth of the matter is that not all incidents are that complex and often times there are ways in which detection, analysis, and response could occur faster. The information security M&M is a way to collect that information and put it to work. In order to understand how we can improve from mistakes, we have to understand why they are made. Uzi Arad summarizes this very well in the book, “Managing Strategic Surprise”, a must read for information security professionals. In this book, he cites three problems that lead to failures in intelligence management, which also apply to information security:

  • The problem of misperception of the material, which stems from the difficulty of understanding the objective reality, or the reality as it is perceived by the opponent.
  • The problems stemming form the prevalence of pre-existing mindsets among the analysts that do not allow an objective professional interpretation of the reality that emerges from the intelligence material.
  • Group pressures, groupthink, or social-political considerations that bias professional assessment and analysis.

The information security M&M aims to provide a forum for overcoming these problems through strategic questioning of incidents that have occurred.

When to Convene an M&M

In an Information Security M&M, the conference should be initiated after an incident has occurred and been remediated. Selecting which incidents are appropriate for M&M is a task that is usually handled by a team lead or member of management who has the ability to recognize when an investigation could have been handled better. This should occur reasonably soon after the incident so important details are fresh on the minds of those involved, but far enough out from the incident that those involved have time to analyze the incident as a whole, post-mortem. An acceptable time frame can usually be about a week after the incident has occurred.

M&M Presenter(s)

The presentation of the investigation will often involve multiple individuals. In medicine, this may include an initial treating emergency room physician, an operating surgeon, and a primary care physician. In information security, this could include an NSM analyst who detected the incident, the incident responder who contained and remediated the incident, the forensic investigator who performed an analysis of a compromised machine, or the malware analyst who reverse engineered the malware associated with the incident.

M&M Peers

The peers involved with the M&M should include at least one counterpart from each particular specialty, at minimum. This means that for every NSM analyst directly involved with the case, there should be at least one other NSM analyst who had nothing to do with it. This aims to get fresh outside views that aren’t tainted by feeling the need to support any actions that were taken in relation to the specific investigation. In larger organizations and more ideal situations, it is nice to have at least two counterparts from each specialty, with one being of lesser experience than the presenting individual and one being of more experience.

The Presentation

The presenting individual or group of individuals should be given at least a few days notice before their presentation. Although the M&M isn’t considered a formal affair, a reasonable presentation is expected to include a timeline overview of the incident, along with any supporting data. The presenter should go through the detection, investigation, and remediation of the incident chronologically and present new findings only as they were discovered during this progression. Once this chronological presentation is given, the incident can then be examined holistically.

During the presentation, participating peers should be expected to ask questions as they arise. Of course, this should be done respectfully by raising your hand as the presenter is speaking, but questions should NOT be saved for after the presentation. This is in order to frame the questions to the presenter as a peer would arrive at them during the investigation process.

Strategic Questioning

Questions should be asked to presenters in such a way as to determine why something was handled in a particular manner, or why it wasn’t handled in an alternative manner. As you may expect, it is very easy to offend someone when providing these types of questions, therefore, it is critical that participants enter the M&M with an open mind and both presenters and peers ask and respond to questions in a professional manner and with due respect.

Initially, it may be difficult for peers to develop questions that are entirely constructive and helpful in overcoming the three problems identified earlier. There are several methods that can be used to stimulate the appropriate type of questioning.

Devils Advocate

One method that Uzi Arad mentions in his contribution to “Managing Strategic Surprise” is the Devils Advocate method. In this method, peers attempt to oppose most every analytical conclusion made by the presenter.  This is done by first determining which conclusions can be challenged, then collecting information from the incident that supports the alternative assertion. It is then up to the presenter to support their own conclusions and debunk competing thoughts.

Alternative Analysis (AA)

R.J. Heuer presents a several of these methods in his paper, “The Limits of Intelligence Analysis”. These methods are part of a set of analytic tools called Alternative Analysis (AA).

Group A / Group B

This analysis involves two groups of experts analyzing the incident separately, based upon the same information. This requires that the presenters (Group A) provide supporting data related to the incident prior to the M&M so that the peers (Group B) can work collaboratively to come up with their own analysis to be compared and contrasted during the M&M. The goal is to establish to individual centers of thought. Whenever points arise where the two groups reach a different conclusion, additional discussion is required to find out why the conclusions differ.

Red Cell Analysis

This method focuses on the adversarial viewpoint, in which peers assume the role of the adversary involved with the particular incident. In doing this, they will question the presenter as to how their investigative steps were completed in reaction to the attackers actions. For instance, a typical defender may solely be focused on finding out how to stop malware from communicating back to the attacker, but the attacker may be more concerned with whether or not the attacker was able to decipher the communication that was occurring. This could lead to a very positive line of questioning that results in new analytic methods that help to better assess the impact of the attacker to benefit containment.

What If Analysis

This method is focused on the potential causes and effects of events that may not have actually occurred. During detection, a peer may ask a question related to how the attack might have been detected if the mechanism that did detect it didn’t do so. In the response to the event, a peer might question what the presenter would have done had the attacker been caught during the data exfiltration process rather than after it had already occurred. These questions don’t always relate directly to the incident at hand, but provide incredibly valuable thought provoking discussion that will better prepare your team for future incidents.

Analysis of Competing Hypothesis

This method is similar to what occurs during a differential diagnosis, where peers crate an exhaustive list of alternative assessments of symptoms that may have been presented. This is most effectively done by utilizing a whiteboard to list every potential diagnosis and then ruling those out based upon testing and review of additional data. You can review my article on differential diagnosis of NSM events here for a more thorough discussion of this type of questioning.

Key Assumptions Check

Most all sciences tend to make assumptions based upon generally accepted facts. This method of questioning is designed to challenge key assumptions and how they affect the investigation of a scenario. This most often pairs with the What If analysis method. As an example, in the spread of malware, it’s been the assumption that when operating within a virtual machine, the malware doesn’t have the ability to escape to the host or other virtual machines residing on it. Given an incident being presented where a virtual machine has been infected with malware, a peer might pose the question of what action might be taken if this malware did indeed escape the virtual environment and infect other virtual machines on the host, or the host itself.

Outcome

During the M&M, all participants should actively take notes. Once the M&M is completed, the presenting individuals should take their notes and combine them into a final report that accompanies their presentation materials and supporting data. This reporting should include a listing of any points which could have been handled differently, and any improvements that could be made to the organization as a whole, either technically or procedurally. This report should be attached the case file associated with the investigation of the incident.

Additional Tips

Having organized and participated in several of these conferences and reviews of similar scope, I have a few other pointers that help in ensuring they provide value.

  • M&M conferences should be held only sporadically, with no more than one per week and no more than three per month.
  • It should be stressed that the purpose of the M&M isn’t to grade or judge an individual, but rather, to encourage the culture of learning.
  • M&M conferences should be moderated by someone at a team lead or lower management level to ensure that the conversation doesn’t get too heated and to steer questions in the right direction.
  • If you make the decision to institute M&M conferences, it should be a requirement that everybody participates at some point, either as a presenter or a peer.
  • The final report that is generated from the M&M should be shared with all technical staff, as well as management.
  • Information security professionals, not unlike doctors, tend to have big egos. The first several conferences might introduce some contention and heated debates. This is to be expected initially, but will work itself out over time with proper direction and moderation.
  • The M&M should be seen as a casual event. It is a great opportunity to provide food and coordinate other activities before and after the conference to take the edge off.
  • Be wary of inviting upper management into these conferences. Their presence will often inhibit open questioning and response and they often don’t have the appropriate technical mindset to gain or provide value to the presentation.

It is absolutely critical that when initiating these conferences, it is done with care. The medical M&M was actually started in the early 1900s by a surgeon named Dr. Ernest Codman at Massachusetts General Hospital in Boston. MGH was so appalled that Dr. Codman suggested that the competence of surgeons should be evaluated that he eventually lost his staff privileges. Now, M&M is a mainstay in modern medicine and something that is done in some of the best hospitals in the world. I’ve seen instances where similar types of shunning occur in information security when these types of peer review opportunities are suggested. As information security practitioners it is crucial that we are accepting of this type of peer review and that we encourage group learning and the refinement of our skills.

References:

  • Campbell, W. (1988). “Surgical morbidity and mortality meetings“. Annals of the Royal College of Surgeons of England 70 (6): 363–365. PMC 2498614.PMID 3207327.
  • Arad, Uzi (2008). Intelligence Management as Risk Management. Paul Bracken, Ian Bremmer, David Gordon (Eds.), Managing Strategic Surprise (43-77). Cambridge: Cambridge University Press.
  • Heuer, Richards J., Jr. “Limits of Intelligence Analysis.” Orbis 49, no. 1 (2005)

4 Ideas for Operationalizing Honeypots

I’ve always thought that the concept of a honeypot was one of the most fascinating things in information security. If you aren’t familiar with honeypots, they are basically traps used to detect or deter attackers on a network. They typically come in two forms; low interaction and high interaction. A low interaction honeypot is software that emulates a set number of services that may run on a computer. When an attacker connects to a low interaction honeypot, he/she will be able to interact with that service on a limited basis, and that interaction will be logged. A high interaction honeypot is more robust and emulates all aspects of an operating system. This is most often a deployed operating system running a number of legitimate services with an extensively level of logging enabled. The thing both of these implementation methods have in common is that the honeypot doesn’t actually contain real data. Should an attacker compromise either type of honeypot, there is no real direct risk of critical data being exposed when deployed properly.

Almost every single honeypot implementation I’ve seen deployed is for research purposes. There isn’t anything wrong with a research honeypot, after all, I run a couple myself (at home) and have learned a lot from it. However, I think there is a lot of operational value that can be gained from deploying honeypots in production environments. I wanted to discuss, at a high level, a few of these strategies and the benefit that can be gained from them.

Honeypots for Prevention

There has been a fair amount of talk recently about security mechanisms designed to drive up the cost of exploiting a network by increasing the time it takes to do so. As a matter of fact, Adobe’s Senior Directory of Product Security and Privacy, Brad Arkin, even recently said that “My goal isn’t to find and fix every security bug. It’s to drive up the cost of writing exploits. We invest a lot of time in building up mitigations that increase the cost and complexity of writing exploits that will become reliable.” Of course, Arkin was referring to the exploitation of software, but the concept still applies to the network side of the house. I’m still a firm believer that your detection capability is still the most important because prevention eventually fails, but if you can drive up the cost of exploiting a network this has the potential to deter some attackers. At a minimum deterring attacks of opportunity can be achieved if you can increase the time cost of exploiting a network, and this may even work to deter attacks of choice as well.

Honeypots can do this by adding to the frustration factor. I see a couple of ways this can be done. The first of which is to utilize a  large number of low interaction honeypots with varying configurations. The important thing here is to vary their configurations as much as possible in order to prevent an attacker from characterizing them and automating them out of their window of visibility. For instance, if you deploy twenty honeypots and they all have ports 22, 80, and 3306 open and all provide the same responses to banner grabs, an attacker is going to be able to correlate this pretty quickly and will simply scan and exclude those hosts from his list of potential targets. The other method for preventive use is to deploy a significant number of high interaction honeypots. This requires a significant time investment, but the right configuration can cause an attacker to waste a significant amount of time in the right places. Again, this strategy isn’t going to prevent aggressive adversaries from reaching their goal, but it will drive up the time cost of lesser determined foes.

Honeypots for Attack Sense and Warning

This is the sacrificial lamb approach to honeypot deployment. In this scenario, honeypots are deployed based upon trust zones within your network. There are different strategies for outlining trust barriers but on a simple network you might define a low trust zone within a wireless or user space network segment, a medium trust zone in a DMZ, and a high trust zone within a server farm network segment. In that sort of topology, all three zones would contain honeypots configured with security comparable to the next step lower. The idea here is that the honeypot should be slightly more vulnerable to attack than everything else in the zone that it is currently in. This configuration provides value in a couple of ways. First, if a honeypot gets compromised, it will likely serve as a warning that other assets within that trust zone may be compromised soon as well, if they aren’t already. Taking this one step further, it is often logical to assume that if a lower trust zone honeypot becomes compromised, the next highest trust zone may be the next target. Depending on how the network is setup, if a higher trust zone includes a honeypot that gets compromised, it could mean that all of the trust zones below it could also have fallen victim to the adversary. This whole model relies on a lot of assumption, but that is the space AS&W operates in.

Honeypots for Detection Related to Critical Assets

I’m a big fan of target-based IDS deployment where instead of deploying a single IDS to your network perimeter, you user more focused IDS’s with finer tuned rule sets and place them closer to organizationally critical assets. This allows for better use of resources across the board as it usually requires less beefy hardware and ensures your analysts won’t see nearly as many false positives. For instance, if your critical data is housed in SQL servers on a single network segment, then deploy an additional IDS to that segment and only utilize SQL focused signatures there rather than on the perimeter IDS. This also allows you to prioritize IDS sensors so that alerts generated by sensors in high priority areas are given priority when it comes to investigations.

I think the same concept of target-based deployment can be tied to Honeypot deployments in the protection of critical assets. If your organization has prioritized their assets (and they should have), then the general idea behind target based honeypot deployment for the purpose of detection would be to configure and deploy honeypots that are virtually identical to the critical servers. This means that they should be running the same services,  talking to the same hosts, and vulnerable to the same types of attacks. The thought here is that if the critical server gets compromised, then so should the honeypot, and vice verse. This is valuable because it isn’t always feasible to log everything on a production server based upon its volume of traffic. This applies to both host based and network based logging. Utilizing an identically configured honeypot that doesn’t see the same amount of utilization allows you to use more aggressive logging, which may allow you to gain more visibility into an attackers movements. This can provide value in helping you determine exactly how an attacker has compromised a system, what they are utilizing the same for, if there is particular data they may be after, and if they have compromised any other systems on the network.

Reverse Honeypots for Intelligence Collection

Although the concept of a reverse honeypot is a bit radical, it really appeals to me considering the industry I work in. The concept of a traditional honeypot is that in which you fill a pot with honey and hope the attacker gets attracted to the honey and sticks his hand in the pot. A reverse honeypot is where you throw some honey in the direction of a target in such a manner as to leave a trail back to the source. The idea being that the target will notice, follow the trail, see the pot, and stick his hand in. In more practical terms, this means that you would attempt to attack a target elsewhere on the Internet. This attack doesn’t necessarily have to be successful and it may just constitute something as simple as a port scan or something as overt as a DoS attempt. During these attacks, no masking of your source IP address should occur and no third party hop points would be used, thus meaning that the target would see your true IP address when reviewing logs of your attack on his network. Given the nature of your target, this may result in his curiosity being peaking and him reciprocating your attack back at you in another form. Of course, within your network you have several vulnerable honeypots of varying interaction levels waiting for the target.

This type of honeypot is solely for the purpose of target based intelligence gathering, but has the potential to be very effective. First and foremost, should the target scan or attack your network you should be able to capture some of the tools, techniques, and procedures (TTPs) that he is using. This type of intelligence can help in recognizing, characterizing, or attributing other computer network exploitation activity to this attacker and may also lend to better detection techniques in the future. One more added value which is incredibly attractive in the modern threat landscape is the identification of hops points. Although you very purposefully did not mask your true source IP address, the attacker may choose to do so. It’s incredibly common for attackers to compromise other hosts elsewhere on the Internet to launch their attacks from, but it’s also common that they will reuse these same hop points for an extended amount of time. If you can identify these hop points then you can use that information to attribute the attacks to a particular operator or group. This is extremely valuable. Of course, this type of activity should be done from non-production networks, because it’s very possible that you might lure an attacker into launching a large scale DDoS attack on your network 10,000 bots strong.

Conclusion

I think there is a lot of room for operationalizing honeypots in production environments. The major factors prohibiting this are a lack of research in this area and a lack of production-grade tools for implementing these techniques. Unfortunately, we are still in a time that IDS is having trouble gaining traction because of the cost it entails, so a future where honeypots can be deployed for the purpose of enhancing network security seems far off. Don’t be surprised however, if you seen a job posting five years down the road for “Honeypot Administrator”. I know I’d have one if I could.

NSM Collection vs. Detection

I was going back through some old bookmarks when I stumbled upon on a post by Richard Bejtlich from 2007 entitled “NSM and Intrusion Detection Differences“. In this article, Richard discussed the concept of ‘immaculate collection’ versus ‘immaculate detection’. Richard’s article references IDS developers desiring immaculate detection while NSM practitioners typically vie for immaculate collection. Given this, I posed the following question to several of my colleagues: Which is more important, collection or detection?

 

The question itself is open to a bit of interpretation, but my group was split about 60/40 favoring collection over detection. I tend to agree with that majority, although the minority had some valid points as well.

 

Those favoring detection argued that a mountain of data, no matter how eloquently collected, is useless without some level of detection capability. Additionally, most in this camp agreed that your detection capability shapes how you perform collection. A few even made the point that they considered collection to be a function of network operations, and not NSM. I can’t disagree with the first of these arguments, but I’m opposed to the other two. I’ll address the argument of whether or not detection shapes collection here.

 

When I think about NSM, I typically think of it in three phases: collection, detection, and analysis. Collection is the gathering and parsing of relevant network security data, and it often performed by a combination of hardware and software. Detection is the process of finding anomalies in collected data that may represent a potential intrusion. Detection is most often done by software, but can be done by humans to a lesser extent. Analysis is the review and investigation of alert data generated during detection. Analysis is typically (and most effectively) done by humans.

 

 

Phases of Network Security Monitoring

 

 

The key takeaway from these three phases is that they form a cycle rather than a beginning to end process. Collected data feeds the detection capability, and the alert data generated from detection feeds the analysis process. What makes this process cyclical is that the investigation and research performed during the analysis process is used to define and shape what data you are collecting.

 

That said, I argue that collection is the most important phase of network security monitoring for a couple of reasons:

 

Detection Depends on Collection

Abraham Lincoln was quoted in saying that if you were to give him six hours to chop down a tree, he would spend the first four hours sharpening his ax. This analogy fits perfect here, because no matter how much thought you put into your detection tools, they are utterly useless if they aren’t digesting the right data. That nice beefy Snort sensor might just be wasting cycles if you’ve placed it on the wrong side of your firewall. Detection fails if collection isn’t done well.

 

Analysis Also Depends on Collection

I hate using the needle in the haystack analogy, but if the hay is covered in manure then you sure aren’t going to want to  spend all of that time digging through it. A human analyst interprets alert data provided by a detection mechanism and then goes out and collects more data in an effort to support his/her investigation. If this data isn’t being collected in an easily retrievable and digestible format then analysis fails. An IDS signature might tell me that a potential attacker is attempting SQL injection on my public facing web server but if I’m not collecting PCAP data and my web server/database logs aren’t accessible then I’m going to have a really hard time finding out if the attack is actually successful.

 

Analysis Feeds Collection Moreso than Detection

I’ve served in the role where I’m the guy creating the detection tools and also in the role where I’m the guy analyzing the alerts generated by the detection tools. It is absolutely true that in some cases collection software/hardware is designed and configured in such a way that it provides data in the appropriate format to a detection tool. This might lead someone to the conclusion that it is detection shaping the collection, but that argument is only made seeing a narrow view of the entire thought process. It is actually the analysis of previous alert data that typically has identified the need for the detection tool that is being created. Remember that detection is most often a task performed by software and it is analysis that is performed by individuals. Software doesn’t identify needs, people do.

 

 

Again, I think this is one of those questions that may or may not have a right answer, but for my two cents, if you gave me six hours to find the bad guys, I’d spend the first four making sure I collected the right data.

 

 

Differential Diagnosis of Network Security Monitoring Events

There are a lot of things that the industry does well when it comes to network security monitoring (NSM). For instance, I tend to think that we have data collection figured out reasonably well. I also think that signature-based intrusion detection is a really well developed science. However, with NSM having only existed for a short period of time there are several facets of it that aren’t too well defined. One such aspect is the actual diagnostic method that people use to analyze NSM events. That is, the process an analyst uses to connect the dots between the initial alert and the final diagnosis. In this article I’m going to discuss the use of a common medical diagnostic method called differential diagnosis and how it can be applied to NSM.

 

Understanding Normal

The first thing that was ever taught to me when I started my career as an NSM analyst was that if you know what normal looks like, then you can determine what is bad. I trusted in this concept for many years and even taught it to others. As true as this statement may be, I believe it is relied on entirely too much. This is primarily due to a failure in separating the collection, detection, and analysis processes.

 

Collection centers on the hardware and software used to collect NSM related data. Consider the collection of full content packet capture (PCAP) data. The use a network tap and DaemonLogger allow you to store this data on disk so that it may be used for the identification and analysis of network security related events. Collection occurs with a combination of hardware and software.

Detection is the process by which collected data is examined and anomalies are identified, typically through some form of signature, anomaly, or statistically based detection. Snort is software that is an example of signature-based intrusion detection that compares collected network traffic to signatures of known malicious activity in an effort to perform pattern matching to determine if something bad has occurred. Detection is typically software focused.

Analysis is what occurs when a human interprets the results of the output of an identification tool. Although Snort may detect a pattern match in a communication sequence and generate an alert, it is a human who is ultimately responsible for reviewing the alert and investigating it to an end determination on its validity. The key concept here is that analysis is human focused.

 

With those three terms more clearly defined and distinctions drawn, it would stand to reason that the concept of knowing what normal looks like in order to determine what is bad is actually more relevant to detection than analysis. Realistically speaking, it’s not feasible in the modern state of network computing to be well versed in every aspect of normal communications. Although some traffic patterns may remain fairly static, the open nature and loose standards that govern network communication protocols result in a constant evolution of traffic patterns. Don’t be mistaken, this is still an important concept that must be incorporated into the analytic approach, it’s just not strong enough to stand on its own as the singular concept new analysts should be taught. Knowing what normal looks like is best used when analyzing specific facets of a potential breach rather than as a holistic method to classify all network traffic you may be capturing.

 

A Differential Approach

The general goal of an NSM analyst is to digest the alerts generated by various detection tools and investigate multiple data sources and perform relevant tests and research to see if their findings represent a network security breach. This is very similar to that of a physician, whose goal is to digest the symptoms a human presents and investigate multiple data sources and perform relevant tests and research to see if their findings represent a breach in the person’s immune system.  Both practitioners share a similar of goal of connecting the dots to find out if something bad has happened and/or is still happening.

Although NSM has only been around a short while, medicine has been around for centuries. This means that they’ve got a head start on us when it comes to developing their diagnostic method. One of the most common diagnostic methods used in clinical medicine is one called differential diagnosis. If you’ve ever seen an episode of “House” then chances are you’ve seen this process in action. The group of doctors will be presented with a set of symptoms and they will create a list of potential diagnosis on a whiteboard. The remainder of the show is spent doing research and performing various tests to eliminate each of these potential conclusions until only one is left. Although the methods used in the show are often a bit unconventional they still fit the bill as a part of the differential diagnosis process.

The differential method is one based upon a process of elimination. It consists of five distinct steps, although in some cases only two will be necessary. The differential process exists as follows:

  1. Identify and list the symptoms
    In medicine, symptoms are typically initially conveyed verbally by the individual experiencing them. In NSM, a symptom is most commonly in the form of an alert generated by some form of intrusion detection system or other detection software. Although this step focuses primarily on the initial symptoms, more symptoms may be added to this list as additional tests or investigations are conducted.
  2.  

  3. Consider and evaluate the most common diagnosis first
    A statement every medical student is taught in their first year is “If you hear hoof beats, look for horses…not zebras.” This is to state to that the most common diagnosis is likely the correct one. As a result, this diagnosis should be evaluated first. The analyst should focus his investigation on doing what is necessary to quickly confirm this diagnosis. If this common diagnosis cannot be determined to be true during this initial step then the analyst should proceed to the next step.
  4.  

  5. List all possible diagnosis for the given symptoms
    The next step in the differential process is to list every possible diagnosis based upon the information currently available with the initially assessed symptoms. This step requires some creative thinking is often most successful when multiple analysts participate in generating ideas. Although you may not have been able to completely confirm the most common diagnosis in the previous step, if you weren’t able to rule it out completely then it should be carried over into the list generated in this step. Each potential diagnosis on this list is referred to as a candidate condition.
  6.  

  7. Prioritize the list of candidate conditions by their severity
    Once a list of candidate conditions is created a physician will prioritize these listing the condition that is the largest threat to human life at the top. In the case of an NSM analyst you should also prioritize this list, but the prioritization should focus on which condition is the biggest threat to your organizations network security. This will be highly dependent upon the nature of your organization. For instance, if “MySQL Database Root Compromise” is a candidate condition then a company whose databases contains social security numbers would prioritize this condition much higher than a company who uses a simple database to store a list of its sales staffs on-call schedule.
  8.  

  9. Eliminate the candidate condition, starting with the most severe
    The final step is where the majority of the action occurs. Based upon the prioritized list created in the previous step the analyst should begin doing what is necessary to eliminate candidate conditions, starting with the condition that poses the greatest threat to network security. This process of elimination requires considering each candidate condition and performing tests, conducting research, and investigating other data sources in an effort to rule them out as a possibility. In some cases investigation on one candidate condition may effectively rule out multiple candidate condition, speeding up this process. Alternatively, investigation of other candidate conditions may prove inconclusive leaving one or two conditions that are unable to be definitively eliminated as possibilities. This is acceptable however as sometimes in network security monitoring (as in medicine) there are anomalies that can’t be explained that require more observation before determining a diagnosis. Ultimately, the goal of this final step is to be left with one diagnosis so that either the incident handling process may begin or the alert can be dismissed as a false positive. It’s very important to remember that “Normal Communication” is a perfectly acceptable diagnosis, and will be the most common diagnosis an NSM analyst arrives at. I also find that remembering that all packets are good unless you can prove they are bad is an important concept to remember during this step.

 

 

Let’s consider this process with a couple of broad case scenarios.

 

Scenario 1

Step 1: Identify and List the Symptoms

Symptoms:

  • Internal host appears to be sending outbound traffic to a Russian IP address
  • The traffic is occurring at regular intervals, every 10 minutes
  • The traffic is HTTPS over port 443, and as such is encrypted and unreadable

Step 2: Consider and Evaluate the Most Common Diagnosis First

It’s been my experience that most entry level analysts will see these symptoms and automatically think that this machine is infected with some form of malware and is phoning home for further instructions. Those analysts tend to key in on that fact that the traffic is going to a Russian IP address and that it is occurring at regular 10 minute intervals. Although those things are worth noting (I wouldn’t have listed them if they weren’t), I don’t buy into the malware theory so easily. I believe entirely too much emphasis is placed on the geographic location of IP addresses, so the fact that the remote IP address is Russian means little to me. Additionally, there are a whole variety of normal communication mechanisms that talk on regular periodic intervals. This includes things like web-based chat, RSS feeds, web-based e-mail, stock tickers, software update processes, and more. Operating on the principal that all packets are good unless you can prove they are bad, I think the most common diagnosis here is that this is normal traffic.

That said, how we can confirm this potential diagnosis? Confirming something is normal can be hard. In this particular instance we could start with some open source research on the Russian IP. Although it’s located in Russia it still may be owned by a legitimate company. If we were to look up the host and find that it was registered to a popular AV vendor we might be able to use that information to conclude that this was an AV application checking for updates. I didn’t mention the URL that the HTTPS traffic is going to, but quickly Googling it may yield some useful information that will help you determine if it is a legitimate site or something that might be hosting malware or some type of botnet C2. Another technique would be to examine the host physically if you have ready access to it in an effort to see if any processes are launched on the machine at the same intervals the traffic is occurring at.

Let’s assume that we weren’t able to make a final determination on whether or not this was normal communication.

Step 3: List all Possible Diagnosis for the Given Symptoms

*There are obviously more candidate conditions in the realm of possibility, but for this and the other scenario I’ve kept it to some of the more common ones for the sake of brevity.

Candidate Conditions:

    • Normal Communication
      We weren’t able to rule this out completely in the previous step so we carry it over to this step.

 

    • Malware Infection / Installed Malicious Logic
      This is used as a broad category. We typically don’t care about the specific strain until we determine that malware may actually exist. If you are concerned about a specific strain then it can be listed separately. Think of this category as a doctor listing “bacterial infection” as a candidate condition knowing that they can further narrow it down later.

 

    • Data Exfiltration from Compromised Host
      Potential that the host could be sending proprietary or confidential information out. This sort of thing would likely be part of a coordinated or targeted attack.

 

    • Misconfiguration
      It’s well within the realm of possibilities that a system administrator fat-fingered an IP address and a piece of software that should be trying to communicate periodically with an internal IP is now trying to do so with a Russian IP. This is really quite common.

 

Step 4: Prioritize the List of Candidate Conditions by their Severity

These priorities are fairly generalized since they are dependent upon your organization.

Priority 1: Data Exfiltration from Compromised Host

Priority 2: Malware Infection / Installed Malicious Logic

Priority 3: Misconfiguration

Priority 4: Normal Communication

Step 5: Eliminate the Candidate Conditions, Starting with the Most Severe

Priority 1: Data Exfiltration from Compromised Host

This one can be a bit tricky to eliminate as a possibility. Full packet capture won’t be of the most assistance here since the traffic is encrypted, but if you can create some statistics from this traffic, or better yet, if you have netflow available, you should be able to determine the amount of data going out. If only a few bytes are going out every then minutes than it’s likely that this is not data exfiltration. The host based research you did earlier on the Russian IP address may also provide some value here in determining the reputation of this host. It would also be of value to determine if any other hosts on your network are talking to this IP address or any other IPs in the same address space. Finally, baselining normal communication for your internal host and comparing it with the potentially malicious traffic may provide some useful insight.

Priority 2: Malware Infection / Installed Malicious Logic

At this point the research you’ve already done should give you a really good idea on whether or not this condition is true. It will be likely that by examining the potential for data exfiltration you will rule this condition out as a result, or will have already been able to confirm it to be true.

Priority 3: Misconfiguration

This condition can best be approached by comparing the traffic of this host against the traffic of one or more hosts with a similar role on the network. If every other workstation on that same subnet has the same traffic pattern, but to a different IP address, then it’s likely that the wrong IP address was entered into a piece of software somewhere proving that a misconfiguration exists. Having access to host-based logs can also be useful in figuring out if a misconfiguration exists since they might exist in Windows or Unix system logs.

Priority 4: Normal Communication

If you’ve gotten this far, then the diagnosis of normal communication should be all that remains on your list of candidate conditions.

Concluding a Diagnosis

At this point you have to use your experience as an analyst and your intuition to decide if you think something malicious is really occurring. If you were able to complete the previous analysis thoroughly, then operating on the assumption that all packets are good unless you can prove they are bad would mean your final diagnosis here should be that this is normal communication. If you still have a hunch something quirky is happening though, there is no shame in monitoring the host further and reassessing once more data has been collected.

 

Scenario 2

Step 1: Identify and List the Symptoms

Symptoms:

  • A web server in our DMZ is receiving massive amounts of inbound traffic
  • The inbound traffic is unreadable and potentially encrypted or obfuscated
  • The inbound traffic is coming to multiple destination ports on the internal host
  • The inbound traffic is UDP based

Step 2: Consider and Evaluate the Most Common Diagnosis First

With the amount of traffic being received by the internal host being very large and the packets using the UDP protocol with random destination ports, my inclination would be that this is some form of denial of service attack.

The quickest way to determine whether something is a denial of service is to assess the amount of traffic being received compared with the normal amount of traffic received on that host. This is something that is really easy to do with netflow data if you have it available. If the host is only receiving 20% more traffic than it normally would then I would consider other alternatives to a DoS. However, if the host is receiving ten or one hundred times its normal amount of traffic then DoS is very likely and almost a certainty.  It’s important to remember that a DoS is still a DoS even if it is unintentional.

Once again, for the sake of this scenario we will continue as though we weren’t able to make a clear determination on whether or not a DoS condition exists.

Step 3: List all Possible Diagnosis for the Given Symptoms

Candidate Conditions:

    • Denial of Service
      We weren’t able to rule this out completely in the previous step so we carry it over to this step.

 

    • Normal Communication
      It doesn’t seem incredibly likely, but there is potential for this to be normal.

 

    • Misdirected Attacks
      When a third party chooses to attack another they will often spoof their source address for the sake of anonymity and to prevent getting DoS’d themselves. This will result in the owner of the spoofed IP they are using seeing that traffic. This web server could be seeing the effects of this.

 

    • Misconfigured External Host
      A misconfiguration can happen on somebody else’s network just as easily as it could on yours. This misconfiguration could result in an external host generating any number of types of traffic and sending them to the web server.

 

    • SPAM Mail Relay
      The server could be misconfigured or compromised in a manner that allows it to be used for relaying SPAM across the Internet.

 

Step 4: Prioritize the List of Candidate Conditions by their Severity

Priority 1: Denial of Service

Priority 2: SPAM Mail Relay

Priority 3: Misconfigured External Host

Priority 4: Misdirected Attacks

Priority 5: Normal Communication

Step 5: Eliminate the Candidate Conditions, Starting with the Most Severe

Priority 1: Denial of Service

We’ve already gone through the paces on this one without being able to identify that it is the definitive diagnosis. Even though this is the most severe we would have to proceed to attempt to eliminate other candidate conditions to help in figuring out if a DoS is occurring. Of course, depending on the effect of the attack it may make the most sense to contain the issue by blocking the traffic before spending more time investigating the root cause.

Priority 2: SPAM Mail Relay

This one is relatively easy to eliminate. If the server was being used as a mail relay then you would have a proportionate amount of traffic going out as you do going in. If that’s not the case and you don’t see any abnormal traffic leaving the server then it is likely that it is not relaying SPAM. If the web server is also running mail services then you can examine the appropriate logs here as well. If it is not supposed to be running mail services you can examine the host to see if it is doing so in an unauthorized manner.

Priority 3: Misconfigured External Host

This one is typically pretty tricky. Unless you can identify the owner of the IP address and communicate with them directly then the most you can hope to do is block the traffic locally and/or report abuse at their ISP level.

Priority 4: Misdirected Attacks

This is another tricky one along the same lines as the previous candidate condition. If it’s an attacker somewhere else whose antics are causing traffic redirection to your server then the most you can do is to report the issue to the ISP responsible for the IP address and block the traffic locally.

Priority 5: Normal Communication

This doesn’t seem likely, but you can’t say this for sure without baselining the normal traffic for the host. Compare its traffic at similar times on previous days to see if you can draw any conclusions. Is the pattern normal and it’s just the amount of traffic that anomalous? Is it both the pattern and the amount that’s anomalous? Does the server ever talk to the offending IP prior to this?

 

Concluding a Diagnosis

In this scenario, it’s very possible that you are left with as many as three candidate conditions that you cannot rule out. The good thing here is that even though you can’t rule these out, the containment and remediation methods would be the same for all of them so you still have gotten to a state of diagnosis that allows the network to recover from whatever is occurring. If the amount of traffic isn’t too great then you may not need to block the activity and you may be able to monitor it further in order to attempt to collect more symptoms that may be useful in providing a more accurate diagnosis.

 

Conclusion

I’ve spent quite a bit of time doing analysis with this differential approach and also reviewing previous investigations post-mortem while applying these concepts and I’ve been really pleased with my findings. I think that if you are struggling with being able to grasp a firm analytical method then this may be a great one to start with. I’m not entirely sure that the differential method is appropriate for all organizations, but just as with medicine, there are competing approaches and I hope to examine more of those in the future so that I can draw more comparisons between the medical field and NSM. If you have any scenarios in which you’ve used this differential approach (for better or for worse), I’d love to hear about them.

Packet Carving with SMB and SMB2

One of the more useful network forensic skills is the ability to extract files from packet captures. This process, known as packet data carving, is crucial when you want to analyze malware or other artifacts of compromise that are transferred across the network. That said, packet data carving has varying degrees of difficulty depending on the type of traffic you are attempting to extract data from. Carving files from simple protocols like HTTP and FTP is something that can be done in a matter of minutes and is usually cut and dry enough that it can be done in an automated fashion with tools like Foremost and Network Miner.

There are articles all over the Internet about carving files from simple protocols so I won’t rehash those. Instead, I want to take a look at a two more complex protocols that are extremely common in production networks. Server Message Block (SMB) is the application-layer protocol that Microsoft operating systems use for file sharing and communication between networked devices. If you live on a Microsoft network (or a Unix network that utilizes SAMBA) then you are a user of SMB or SMB2, depending on your operating system version. In this article I’m going to discuss the art of carving files from SMB and SMB2 traffic. If you want to follow along you’ll need to download a copy of Wireshark (http://www.wireshark.org) and your favorite hex editor. I’ve used Cygnus Hex Editor (http://www.softcircuits.com/cygnus/fe/) for the purpose of this article since it’s simple and a free version exists.

 

Carving Files from SMB Packets

 

The first version of SMB is in use on all modern Microsoft operating systems prior to Windows Vista. In order to setup a packet capture for this scenario I took two Windows XP SP3 virtual machines running on VMWare Workstation and placed them in the same network. Once they were able to communicate with each other I setup a shared folder on one host (192.168.47.132) that is acting as the server. I then fired up Wireshark and began capturing packets as I copied an executable file from the client (192.168.47.133) to the servers shared folder. The resulting packet capture is called smb_puttyexe_xfer.pcap.

If you’ve never looked at SMB traffic then don’t get scared by all the different types of SMB packets in the capture, we will only be looking at a few of them. This article isn’t meant to be an exhaustive reference on each and every type of SMB packet (there are over a hundred of them), so if you want the gory details then take a look at the references at this end of this article.

In order to carve the file out of these packets we have to find some basic information about it. Before and after transferring a file to a server the client will attempt to open the file in order to see if it exists. This is done with an SMB NT Create AndX Request packet.  The response from the server to this is an SMB NT Create AndX Response, which contains the name, extension, and size of the file being transferred. This is everything we need to get started. You can filter for Create AndX Response packets in Wireshark with the filter (smb.cmd == 0xa2) && (smb.flags.response == 1). If we examine one of those requests that occur after the file has been transferred, we can identify that the file being transferred is putty.exe and its file size is 454,657 bytes. We will use this information later.

Figure 1: Note the file name, extension, and size.

The next step we have to take in order to extract this file is to isolate the appropriate block of traffic. Wireshark makes this pretty easy with its Follow TCP Stream functionality. Start by right-clicking any packet in the capture file and selecting Follow TCP Stream. This will bring up a window that contains all of the data being transferred in this particular communication stream concatenated together without all of the layer 2-4 headers getting in the way. We are only concerned about the traffic transferred from the client to the server so we will need to specify this in the directional drop down box by selecting 192.168.47.133 –> 192.168.47.132 (458592 bytes). Click Save As and save the file using the name putty.raw.

Figure 2: Saving the isolated traffic from Wireshark

If you were to view the properties of the data you just extracted and save you should find that its file size is 458,592 bytes. This is 3,935 bytes more than the size of the actual file that was transferred. This means that our goal is to get this raw files size down to exactly 454,657 bytes. This is where the real carving begins.

First things first, we have to delete all of the extra data that occurs before the executable data actually begins. Since we do know that the transferred file is an executable the quickest way to do this is to look for the executable header and delete everything that occurs before it. The executable header begins with the hex bytes 4D 5A (MZ in ASCII), which occurs approximately 1112 bytes into the putty.raw file. Once deleted, resave the file as putty.stage1. You should now be down to a file size of 457,480 bytes.

Figure 3: Removing added bytes from the beginning of the file

Now things get a bit trickier. SMB transmits data in blocks. This is great for reliability since a lost or damaged block can be retransmitted, but it adds some extra work for us. This is because each block must contain some bytes of SMB header data in order to be interpreted correctly by the host that is receiving it. The good thing is that the size of this data is somewhat predictable, but you have to understand a bit more about SMB in order to put the rubber to the road. The thing to know here is that the data block size in SMB is limited to 64KB, or 65536 bytes.  Of this amount, only 60KB is typically used for each block. These 61,440 bytes are combined with an additional 68 bytes of SMB header information. This means that after every 61,440 bytes of data we will have to strip out the next 68 bytes.

There is one thing to add to this that must be taken into consideration before stripping out those bytes. As a part of the normal SMB communication sequence, an additional packet is sent right after the first block. This is an NT Trans Request packet, which is packet 77 in the capture file. The SMB portion of this packet is 88 bytes, which means we will have to remove those 88 bytes in addition to the 68 bytes that make up the normal SMB block header, for a total of 156 bytes.

Now that we have all that sorted out let’s start removing bytes. In your hex editor, skip one byte past the 61,440th byte. This will be offset 0x0F000. You should start with this byte and select a range of 156 bytes and delete them. Save this file as putty.stage2.

Figure 4: Removing the initial 156 bytes

Things get a bit easier now as we are just concerned with stripping out the 68 bytes after every block. Skip through the file in 61,440 byte increments deleting 68 bytes each time. This should occur X times in this file at offsets 0x1e000, 0x2d000, and 0x3c000, 0x4b000, 0x5a000, 0x69000. Once finished, save the file as putty.stage3.

Figure 5: Removing a 68 byte SMB header block

Go ahead and take a look at the file size of putty.stage3. We are still XXX bytes off from our target, but luckily the last part is the easiest. The data stream is actually just padded by some extra information that needs to be deleted. We know that the file should be 454,657 bytes, so browse to that byte and delete everything that occurs after it.


Figure 6: Trimming the extra bytes off the end of the file

Save the final product as putty.exe and if you did everything right, you should have a fully functioning executable.

Figure 7: Success! The executable runs!

 

The whole process can be broken down into a series of repeatable steps:

  1. Record the file name, extension, and size by examining one of the SMB NT Create AndX Response packets
  2. Isolate and extract the appropriate stream data from Wireshark by using the Follow TCP Stream feature and selecting the appropriate direction of traffic
  3.  Remove all of the bytes occurring before the actual file header using a hex editor
  4. Following the first 61,440 byte block, remove 156 bytes
  5. Following each successive 61,440 byte block, remove 68 bytes
  6. Trim the remaining bytes off of the file so that it matches the file size recorded in step 1

 

Carving Files from SMB2 Packets

 

Microsoft introduced SMB2 with Windows Vista and began using it with its newer operating systems moving forward. In order to setup a packet capture for this scenario I took two Windows 7 (x32) virtual machines running on VMWare Workstation and placed them in the same network. Once they were able to communicate with each other I setup a shared folder on one host (192.168.47.128) that is acting as the server. I then fired up Wireshark and began capturing packets as I copied an executable file from the client (192.168.47.129) to the servers shared folder. The resulting packet capture is called smb2_puttyexe_xfer.pcap.

You should notice that this traffic is a little bit cleaner than the SMB traffic we looked at earlier. This is because SMB2 is optimized so that there are a lot less commands. Whereas SMB had over a hundred commands and subcommands, SMB2 only has nineteen. Regardless, we still need to find the filename being transferred and the size of that file. One of the best places to do this is at one of the SMB2 Create Response File packets. This packet type serves a purpose similar to that of the SMB NT Create AndX Response packet. You can filter these out in Wireshark with the filter (smb2.cmd == 5) && (smb2.flags.response == 1). The last one of these in the capture, which is packet 81, is the one we want to look at since it occurs after the file transfer is complete. This identifies the file name as putty.exe and the file size as 454,656 bytes. This is indeed the same file as our earlier example, but it is being reported as being one byte smaller. The missing byte is just padding at the end of the file and has a null value so it’s not of any real concern to us.

Figure 8:  Once again we note the file name, extension, and size

At this point you should perform the same steps as we did earlier to isolate and extract the data stream from the capture using Wiresharks Follow TCP Stream option. Doing this should yield a new putty.raw file whose file size is 459,503 bytes. This is 4,847 too big, so it’s time to get to carving.

Once again we need to start by stripping out all of the data before the executable header. Fire up your favorite hex editor and remove everything before the bytes 4D 5A. This should account for a deletion of 1,493 bytes.

Figure 9: Removing the extra bytes found prior to the executable header

Now things change a bit. SMB2 works in a method similar to SMB, but it actually allows for more data to be transferred at once. SMB had a maximum block size of 64K because it has a limit of 16-bit data sizes. SMB2 uses either 32-bit or 64-bit data sizes, which raises the 64KB limit. In the case of the transfer taking place in the sample PCAP file, these were two 32-bit Windows 7 hosts under their default configuration which means that the block size is set at 64KB. Unlike SMB however, the full 64KB is used, so we will see data in chunks of 65,536 bytes being transferred. These 65,536 bytes combine with a 116 byte SMB2 header to form the full block.

SMB2 doesn’t include an additional initial request packet like the SMB Trans Request, so we don’t have to worry about stripping out any extra bytes right off the bat. As a matter of fact, some might say that carving data from SMB2 is a bit easier since you only have to strip out 116 bytes after each block of 65,536 bytes. You can do this now on putty.stage1. In doing so you should be deleting 116 bytes of data at offsets 0x10000, 0x20000, 0x30000, 0x40000, 0x50000and 0x60000.

Figure 10: Removing 116 bytes of data following the first 65,536 chunk

Once you’ve finished this save the file as putty.stage2. All that is left is to remove the final trailing bytes from the file. In order to do this, browse to by 454,656 and delete every byte that occurs after it.

Figure 11: Removing the final trailing bytes

Finally, save the file as putty.exe and you will have a fully functioning executable. The process of carving a file from an SMB2 data stream breaks down as follows:

  1. Record the file name, extension, and size by examining one of the SMB2 Create Response File packets
  2. Isolate and extract the appropriate stream data from Wireshark by using the Follow TCP Stream feature and selecting the appropriate direction of traffic
  3.  Remove all of the bytes occurring before the actual file header using a hex editor
  4. Following each successive 65,536 byte block (assuming a 64K block size), remove 116 bytes
  5. Trim the remaining bytes off of the file so that it matches the file size recorded in step 1

 

Conclusion

 

That’s all there is to it. I’ll be the first to admit that I didn’t cover every single aspect of SMB and SMB2 here and there are a few factors that might affect your success in carving files from these streams, but this article shows the overall process. Taking this one step farther, it’s pretty reasonable to assume that this process can be automated with a quick Python script, but this is something I’ve not devoted the time to yet. If you feel like taking up that challenge then be sure to get in touch and I’ll be glad to post your code as an addendum to this post. In the mean time, happy carving!

 

References

http://msdn.microsoft.com/en-us/library/cc246231%28v=PROT.10%29.aspx

http://msdn.microsoft.com/en-us/library/cc246482%28v=PROT.10%29.aspx

http://channel9.msdn.com/Blogs/Darryl/Server-Message-Block-SMB21-Drill-down