Tag Archives: network

Differential Diagnosis of Network Security Monitoring Events

There are a lot of things that the industry does well when it comes to network security monitoring (NSM). For instance, I tend to think that we have data collection figured out reasonably well. I also think that signature-based intrusion detection is a really well developed science. However, with NSM having only existed for a short period of time there are several facets of it that aren’t too well defined. One such aspect is the actual diagnostic method that people use to analyze NSM events. That is, the process an analyst uses to connect the dots between the initial alert and the final diagnosis. In this article I’m going to discuss the use of a common medical diagnostic method called differential diagnosis and how it can be applied to NSM.


Understanding Normal

The first thing that was ever taught to me when I started my career as an NSM analyst was that if you know what normal looks like, then you can determine what is bad. I trusted in this concept for many years and even taught it to others. As true as this statement may be, I believe it is relied on entirely too much. This is primarily due to a failure in separating the collection, detection, and analysis processes.


Collection centers on the hardware and software used to collect NSM related data. Consider the collection of full content packet capture (PCAP) data. The use a network tap and DaemonLogger allow you to store this data on disk so that it may be used for the identification and analysis of network security related events. Collection occurs with a combination of hardware and software.

Detection is the process by which collected data is examined and anomalies are identified, typically through some form of signature, anomaly, or statistically based detection. Snort is software that is an example of signature-based intrusion detection that compares collected network traffic to signatures of known malicious activity in an effort to perform pattern matching to determine if something bad has occurred. Detection is typically software focused.

Analysis is what occurs when a human interprets the results of the output of an identification tool. Although Snort may detect a pattern match in a communication sequence and generate an alert, it is a human who is ultimately responsible for reviewing the alert and investigating it to an end determination on its validity. The key concept here is that analysis is human focused.


With those three terms more clearly defined and distinctions drawn, it would stand to reason that the concept of knowing what normal looks like in order to determine what is bad is actually more relevant to detection than analysis. Realistically speaking, it’s not feasible in the modern state of network computing to be well versed in every aspect of normal communications. Although some traffic patterns may remain fairly static, the open nature and loose standards that govern network communication protocols result in a constant evolution of traffic patterns. Don’t be mistaken, this is still an important concept that must be incorporated into the analytic approach, it’s just not strong enough to stand on its own as the singular concept new analysts should be taught. Knowing what normal looks like is best used when analyzing specific facets of a potential breach rather than as a holistic method to classify all network traffic you may be capturing.


A Differential Approach

The general goal of an NSM analyst is to digest the alerts generated by various detection tools and investigate multiple data sources and perform relevant tests and research to see if their findings represent a network security breach. This is very similar to that of a physician, whose goal is to digest the symptoms a human presents and investigate multiple data sources and perform relevant tests and research to see if their findings represent a breach in the person’s immune system.  Both practitioners share a similar of goal of connecting the dots to find out if something bad has happened and/or is still happening.

Although NSM has only been around a short while, medicine has been around for centuries. This means that they’ve got a head start on us when it comes to developing their diagnostic method. One of the most common diagnostic methods used in clinical medicine is one called differential diagnosis. If you’ve ever seen an episode of “House” then chances are you’ve seen this process in action. The group of doctors will be presented with a set of symptoms and they will create a list of potential diagnosis on a whiteboard. The remainder of the show is spent doing research and performing various tests to eliminate each of these potential conclusions until only one is left. Although the methods used in the show are often a bit unconventional they still fit the bill as a part of the differential diagnosis process.

The differential method is one based upon a process of elimination. It consists of five distinct steps, although in some cases only two will be necessary. The differential process exists as follows:

  1. Identify and list the symptoms
    In medicine, symptoms are typically initially conveyed verbally by the individual experiencing them. In NSM, a symptom is most commonly in the form of an alert generated by some form of intrusion detection system or other detection software. Although this step focuses primarily on the initial symptoms, more symptoms may be added to this list as additional tests or investigations are conducted.

  3. Consider and evaluate the most common diagnosis first
    A statement every medical student is taught in their first year is “If you hear hoof beats, look for horses…not zebras.” This is to state to that the most common diagnosis is likely the correct one. As a result, this diagnosis should be evaluated first. The analyst should focus his investigation on doing what is necessary to quickly confirm this diagnosis. If this common diagnosis cannot be determined to be true during this initial step then the analyst should proceed to the next step.

  5. List all possible diagnosis for the given symptoms
    The next step in the differential process is to list every possible diagnosis based upon the information currently available with the initially assessed symptoms. This step requires some creative thinking is often most successful when multiple analysts participate in generating ideas. Although you may not have been able to completely confirm the most common diagnosis in the previous step, if you weren’t able to rule it out completely then it should be carried over into the list generated in this step. Each potential diagnosis on this list is referred to as a candidate condition.

  7. Prioritize the list of candidate conditions by their severity
    Once a list of candidate conditions is created a physician will prioritize these listing the condition that is the largest threat to human life at the top. In the case of an NSM analyst you should also prioritize this list, but the prioritization should focus on which condition is the biggest threat to your organizations network security. This will be highly dependent upon the nature of your organization. For instance, if “MySQL Database Root Compromise” is a candidate condition then a company whose databases contains social security numbers would prioritize this condition much higher than a company who uses a simple database to store a list of its sales staffs on-call schedule.

  9. Eliminate the candidate condition, starting with the most severe
    The final step is where the majority of the action occurs. Based upon the prioritized list created in the previous step the analyst should begin doing what is necessary to eliminate candidate conditions, starting with the condition that poses the greatest threat to network security. This process of elimination requires considering each candidate condition and performing tests, conducting research, and investigating other data sources in an effort to rule them out as a possibility. In some cases investigation on one candidate condition may effectively rule out multiple candidate condition, speeding up this process. Alternatively, investigation of other candidate conditions may prove inconclusive leaving one or two conditions that are unable to be definitively eliminated as possibilities. This is acceptable however as sometimes in network security monitoring (as in medicine) there are anomalies that can’t be explained that require more observation before determining a diagnosis. Ultimately, the goal of this final step is to be left with one diagnosis so that either the incident handling process may begin or the alert can be dismissed as a false positive. It’s very important to remember that “Normal Communication” is a perfectly acceptable diagnosis, and will be the most common diagnosis an NSM analyst arrives at. I also find that remembering that all packets are good unless you can prove they are bad is an important concept to remember during this step.



Let’s consider this process with a couple of broad case scenarios.


Scenario 1

Step 1: Identify and List the Symptoms


  • Internal host appears to be sending outbound traffic to a Russian IP address
  • The traffic is occurring at regular intervals, every 10 minutes
  • The traffic is HTTPS over port 443, and as such is encrypted and unreadable

Step 2: Consider and Evaluate the Most Common Diagnosis First

It’s been my experience that most entry level analysts will see these symptoms and automatically think that this machine is infected with some form of malware and is phoning home for further instructions. Those analysts tend to key in on that fact that the traffic is going to a Russian IP address and that it is occurring at regular 10 minute intervals. Although those things are worth noting (I wouldn’t have listed them if they weren’t), I don’t buy into the malware theory so easily. I believe entirely too much emphasis is placed on the geographic location of IP addresses, so the fact that the remote IP address is Russian means little to me. Additionally, there are a whole variety of normal communication mechanisms that talk on regular periodic intervals. This includes things like web-based chat, RSS feeds, web-based e-mail, stock tickers, software update processes, and more. Operating on the principal that all packets are good unless you can prove they are bad, I think the most common diagnosis here is that this is normal traffic.

That said, how we can confirm this potential diagnosis? Confirming something is normal can be hard. In this particular instance we could start with some open source research on the Russian IP. Although it’s located in Russia it still may be owned by a legitimate company. If we were to look up the host and find that it was registered to a popular AV vendor we might be able to use that information to conclude that this was an AV application checking for updates. I didn’t mention the URL that the HTTPS traffic is going to, but quickly Googling it may yield some useful information that will help you determine if it is a legitimate site or something that might be hosting malware or some type of botnet C2. Another technique would be to examine the host physically if you have ready access to it in an effort to see if any processes are launched on the machine at the same intervals the traffic is occurring at.

Let’s assume that we weren’t able to make a final determination on whether or not this was normal communication.

Step 3: List all Possible Diagnosis for the Given Symptoms

*There are obviously more candidate conditions in the realm of possibility, but for this and the other scenario I’ve kept it to some of the more common ones for the sake of brevity.

Candidate Conditions:

    • Normal Communication
      We weren’t able to rule this out completely in the previous step so we carry it over to this step.


    • Malware Infection / Installed Malicious Logic
      This is used as a broad category. We typically don’t care about the specific strain until we determine that malware may actually exist. If you are concerned about a specific strain then it can be listed separately. Think of this category as a doctor listing “bacterial infection” as a candidate condition knowing that they can further narrow it down later.


    • Data Exfiltration from Compromised Host
      Potential that the host could be sending proprietary or confidential information out. This sort of thing would likely be part of a coordinated or targeted attack.


    • Misconfiguration
      It’s well within the realm of possibilities that a system administrator fat-fingered an IP address and a piece of software that should be trying to communicate periodically with an internal IP is now trying to do so with a Russian IP. This is really quite common.


Step 4: Prioritize the List of Candidate Conditions by their Severity

These priorities are fairly generalized since they are dependent upon your organization.

Priority 1: Data Exfiltration from Compromised Host

Priority 2: Malware Infection / Installed Malicious Logic

Priority 3: Misconfiguration

Priority 4: Normal Communication

Step 5: Eliminate the Candidate Conditions, Starting with the Most Severe

Priority 1: Data Exfiltration from Compromised Host

This one can be a bit tricky to eliminate as a possibility. Full packet capture won’t be of the most assistance here since the traffic is encrypted, but if you can create some statistics from this traffic, or better yet, if you have netflow available, you should be able to determine the amount of data going out. If only a few bytes are going out every then minutes than it’s likely that this is not data exfiltration. The host based research you did earlier on the Russian IP address may also provide some value here in determining the reputation of this host. It would also be of value to determine if any other hosts on your network are talking to this IP address or any other IPs in the same address space. Finally, baselining normal communication for your internal host and comparing it with the potentially malicious traffic may provide some useful insight.

Priority 2: Malware Infection / Installed Malicious Logic

At this point the research you’ve already done should give you a really good idea on whether or not this condition is true. It will be likely that by examining the potential for data exfiltration you will rule this condition out as a result, or will have already been able to confirm it to be true.

Priority 3: Misconfiguration

This condition can best be approached by comparing the traffic of this host against the traffic of one or more hosts with a similar role on the network. If every other workstation on that same subnet has the same traffic pattern, but to a different IP address, then it’s likely that the wrong IP address was entered into a piece of software somewhere proving that a misconfiguration exists. Having access to host-based logs can also be useful in figuring out if a misconfiguration exists since they might exist in Windows or Unix system logs.

Priority 4: Normal Communication

If you’ve gotten this far, then the diagnosis of normal communication should be all that remains on your list of candidate conditions.

Concluding a Diagnosis

At this point you have to use your experience as an analyst and your intuition to decide if you think something malicious is really occurring. If you were able to complete the previous analysis thoroughly, then operating on the assumption that all packets are good unless you can prove they are bad would mean your final diagnosis here should be that this is normal communication. If you still have a hunch something quirky is happening though, there is no shame in monitoring the host further and reassessing once more data has been collected.


Scenario 2

Step 1: Identify and List the Symptoms


  • A web server in our DMZ is receiving massive amounts of inbound traffic
  • The inbound traffic is unreadable and potentially encrypted or obfuscated
  • The inbound traffic is coming to multiple destination ports on the internal host
  • The inbound traffic is UDP based

Step 2: Consider and Evaluate the Most Common Diagnosis First

With the amount of traffic being received by the internal host being very large and the packets using the UDP protocol with random destination ports, my inclination would be that this is some form of denial of service attack.

The quickest way to determine whether something is a denial of service is to assess the amount of traffic being received compared with the normal amount of traffic received on that host. This is something that is really easy to do with netflow data if you have it available. If the host is only receiving 20% more traffic than it normally would then I would consider other alternatives to a DoS. However, if the host is receiving ten or one hundred times its normal amount of traffic then DoS is very likely and almost a certainty.  It’s important to remember that a DoS is still a DoS even if it is unintentional.

Once again, for the sake of this scenario we will continue as though we weren’t able to make a clear determination on whether or not a DoS condition exists.

Step 3: List all Possible Diagnosis for the Given Symptoms

Candidate Conditions:

    • Denial of Service
      We weren’t able to rule this out completely in the previous step so we carry it over to this step.


    • Normal Communication
      It doesn’t seem incredibly likely, but there is potential for this to be normal.


    • Misdirected Attacks
      When a third party chooses to attack another they will often spoof their source address for the sake of anonymity and to prevent getting DoS’d themselves. This will result in the owner of the spoofed IP they are using seeing that traffic. This web server could be seeing the effects of this.


    • Misconfigured External Host
      A misconfiguration can happen on somebody else’s network just as easily as it could on yours. This misconfiguration could result in an external host generating any number of types of traffic and sending them to the web server.


    • SPAM Mail Relay
      The server could be misconfigured or compromised in a manner that allows it to be used for relaying SPAM across the Internet.


Step 4: Prioritize the List of Candidate Conditions by their Severity

Priority 1: Denial of Service

Priority 2: SPAM Mail Relay

Priority 3: Misconfigured External Host

Priority 4: Misdirected Attacks

Priority 5: Normal Communication

Step 5: Eliminate the Candidate Conditions, Starting with the Most Severe

Priority 1: Denial of Service

We’ve already gone through the paces on this one without being able to identify that it is the definitive diagnosis. Even though this is the most severe we would have to proceed to attempt to eliminate other candidate conditions to help in figuring out if a DoS is occurring. Of course, depending on the effect of the attack it may make the most sense to contain the issue by blocking the traffic before spending more time investigating the root cause.

Priority 2: SPAM Mail Relay

This one is relatively easy to eliminate. If the server was being used as a mail relay then you would have a proportionate amount of traffic going out as you do going in. If that’s not the case and you don’t see any abnormal traffic leaving the server then it is likely that it is not relaying SPAM. If the web server is also running mail services then you can examine the appropriate logs here as well. If it is not supposed to be running mail services you can examine the host to see if it is doing so in an unauthorized manner.

Priority 3: Misconfigured External Host

This one is typically pretty tricky. Unless you can identify the owner of the IP address and communicate with them directly then the most you can hope to do is block the traffic locally and/or report abuse at their ISP level.

Priority 4: Misdirected Attacks

This is another tricky one along the same lines as the previous candidate condition. If it’s an attacker somewhere else whose antics are causing traffic redirection to your server then the most you can do is to report the issue to the ISP responsible for the IP address and block the traffic locally.

Priority 5: Normal Communication

This doesn’t seem likely, but you can’t say this for sure without baselining the normal traffic for the host. Compare its traffic at similar times on previous days to see if you can draw any conclusions. Is the pattern normal and it’s just the amount of traffic that anomalous? Is it both the pattern and the amount that’s anomalous? Does the server ever talk to the offending IP prior to this?


Concluding a Diagnosis

In this scenario, it’s very possible that you are left with as many as three candidate conditions that you cannot rule out. The good thing here is that even though you can’t rule these out, the containment and remediation methods would be the same for all of them so you still have gotten to a state of diagnosis that allows the network to recover from whatever is occurring. If the amount of traffic isn’t too great then you may not need to block the activity and you may be able to monitor it further in order to attempt to collect more symptoms that may be useful in providing a more accurate diagnosis.



I’ve spent quite a bit of time doing analysis with this differential approach and also reviewing previous investigations post-mortem while applying these concepts and I’ve been really pleased with my findings. I think that if you are struggling with being able to grasp a firm analytical method then this may be a great one to start with. I’m not entirely sure that the differential method is appropriate for all organizations, but just as with medicine, there are competing approaches and I hope to examine more of those in the future so that I can draw more comparisons between the medical field and NSM. If you have any scenarios in which you’ve used this differential approach (for better or for worse), I’d love to hear about them.

GFIRST 2011 Presentation Slides, Code, and Thoughts

I’m sitting in my hotel room after just finishing my last session at US-CERT GFIRST in Nashville, TN. This was my first time at GFIRST both as an attendee and presenter, and I really had a great time. Where I’m originally from in Kentucky isn’t too far from Nashville so I am familiar with the area and the venue choice, the Gaylord Opryland Hotel, is a beautiful facility and top-notch for this kind of conference. I wanted to take a moment to address where people can find the resources for my presentation as well as my thoughts on some of the presentations I had a chance to see and the conference as a whole.

My Presentation

Along with my friend and colleague Jason Smith, we presented a talk on Real World Security Scripting. At a bare minimum, we wanted to share some quick and dirty scripts we wrote to do some pretty neat things within our security operations center (SOC) at SPAWAR. At a higher level, we really hoped that we could encourage some people to get involved with low level BASH, Python, and PERL scripting to automate tasks within their SOC environment as well as increase capabilities of the SOC and its staff. We generated quite a bit of interest, and as a result it looks like several people were turned away because the room was filled to fire code capacity. Our sincere apologies to those who missed to talk. We got some really positive feedback from folks who did make it to the presentation.

As promised, we will be releasing our slides and source code for the presentation. The slides can be downloaded here. As for the source code, we are maintaining the distribution release on https://www.forge.mil, which requires a DOD CAC or ECA certificate to access. I understand that a lot of government folks outside of DOD don’t have access to forge.mil, so we are trying to find another place to host this code where we can control access to only people in the .gov or .mil space. In the meantime, if you would like to get copies of the code, please e-mail me at my mil address (chris.sanders.ctr@nsoc.med.osd.mil) from your mil/gov address and I will get it over to you. We are hoping to get all of that bundled up by next week.


Presentations I Attended

Keynote Panel Discussion – “Unplug to Save”

I started the week on Tuesday by attending the opening ceremony in which there was a panel discussion between several leaders in the government cyber defense community. The panel included Winn Schwartau, Mark Bengel, Doris Gardner, John Linkous, and John Pray, Jr and was moderated by Bobbie Stempfley. If you aren’t familiar with those individuals I’ll leave the Googling to you :).


The discussion was centered on the concept of “unplug to save”, focusing on whether it was an acceptable solution to unplug an entity from the Internet in order to prevent a catastrophic event from occurring as a result of a cyber attack. The panel was split and brought up several good points about the interdepencies between certain aspects of government and national defense, namely citing the one that were unknown. Truth be told, sometimes we just don’t know the affect removing certain networks from the Internet would have. I’m of the opinion that in some cases hitting the kill switch is the best policy, but that is only in an extreme and I’m not sure who that authority should be put on. The panel also got into a discussion of the inherently flawed nature of the Internet and the need for an architecture redesign. That was all fine and dandy and I won’t disagree…but until some form of governing body takes on the task of redesigning the fundamental protocols of the Internet and it is taken seriously then this is just a pie in the sky dream.


The only thing that really irked me during the discussion was when one of the panelist mentioned how we could “solve the cyber problem” by hiring the types of hackers who can’t get clearances. It would seem to be that doing such a thing would be a prime way to generate more Bradley Manning-esque cases. Granted, Manning wasn’t a computer security expert by any means, but imagine what someone with his kind of access could do with a bit of hacking knowledge. I’d just asoon we make cyber jobs within the government more attractive to young professionals so that they stay on the straight and narrow instead of the USG resorting to hiring criminals.



Internet Blockades

This talk was presented by Dr. Earl Zmijewski from Renesys and was one of the talks I enjoyed the most. He described several types of Internet censoring, blocking, and filtering techniques used across the world citing recent examples of Egypy, Libya, North Korea, and of course, the great firewall of China. All of his examples had technical data to back them up which really left me with satisfied. Random fact – N. Korea only has 768 public IP addresses.



Using Differential Network Traffic Analysis to Find Non-Signature Threats

This talk was centered on the creation of metadata of layer 7 data on the network. This isn’t entirely a new concept, but its one that most people are just now keying in on. The general idea is that you can strip out only the layer 7 data from HTTP/DNS/EMail streams, index it, and store it so that you can perform analysis on it. The benefit here is that the amount of disk space required for storage of this type of data is much less than storing full PCAP, allowing for more long term analytics. The talk was presented by David Cavuto from Narus, who did describe a few useful analytics I hadn’t though of. For example, collecting the length of HTTP request URIs and performing a standard deviation of those to look for outliers. This could potentially find incredibly long or incredibly short URIs that might be generated by malicious code.


Unfortuantely, being a vendor talk, Mr. Cavuto didn’t provide anything that would help people generate layer 7 metadata, but he did have a product he was selling that would do it. Fortunately, I have some code that will generate this type of metadata from PCAP. I’m going to button that up and release it here at some point…for free 🙂



Getting Ahead of Targeted and Zero-Day Malware Using Multiple Concurrent Detection Methodologies

This was, by far, my favorite presentaiton of the week. It was given by Eddie Schwartz, the new CSO at RSA. The talk was centered around investing time in the right areas of analysis. Namely, looking across the data sources that matter and not relying on the IDS to do all the work. Once Mr. Schwartz releases his slides I would recommend checking them out. He is a man who understands intrusion detection and how to make it effective. My favorite part of his talk was something he said a couple of times: Yes, doing it this way is hard. Suck it up. It gets easier.



They Are In Your Network, Now What?

This talk was presented by Joel Esler of Sourcefire. Joel is a really smart guy and a great presenter and he didn’t disappoint. My big take away from this one was his discussion of Razorback, which I really think is going to be one of the next big things in intrusion detection. I think a lot of the crowd missed the point on this. There were a lot of complaints because of the amount of legwork required to integrate the tool, but I think most of those people were overlooking the early stage the tool was in and the potential impact of the community released nuggets and detection plugins. I played with Razorback when it was first released and look forward to digging into it again once some of the setup and configuration pains are eased. I’ve already thought of quite a few nuggets that I could possibly write for it.



Analysis Pipeline: Real-time Flow Processing

I’m a huge fan of SiLK for netflow collection and analysis so I was excited to hear Daniel Ruef from CERT|SEI talk about Analysis Pipeline, a component that adds some cool flexibility to SiLK. Overall, I was really impressed with the capability and am looking forward to playing with the next version when it comes out in a couple of months. I always say that if you aren’t collecting netflow you are missing out on some great data, and SiLK is a great way to start collecting and parsing netflow for free. If you are already using SiLK, please do yourself a favor and look into the free add-on Analysis Pipeline.



Advanced Command and Control Channels

I thought this was an awesome overview of traditional and more advanced C2 channels that malware use. I don’t think anything here was really new, but the way the presentation was broken down was very intuitive and the examples that were given were rock solid. This was given by Neal Keating, a cyber intel analyst with the Department of State.



Final Thoughts

I really enjoyed the conference and honestly consider it one of the best and most relevant conferences for folks in cyber security within the gov/mil space. My only major complaint was that a few vendors managed to sneak their way into speaking and basically giving product sales pitches rather than technical talks. I’m hoping that feedback will make it back to the US-CERT folks and more effort will go into preventing that from happening in the future. I hate showing up to a talk that I hope to learn something from and being drilled with sales junk about products I don’t want. Yes, I’m looking at you General Dynamics and Netezza.


Overall, the staff did a great job of organizing and I’d be happy to have the opportunity to attend and speak at GFIRST 2012 in Atlanta next year.



TL;DR – Real World Security Scripting Presentation Slides – http://chrissanders.org/pub/GFIRST2011-SandersSmith.pdf – Please e-mail me for full code.