Category Archives: Network Security Monitoring

Security Onion Cheat Sheet

I’ve been a Security Onion user for a long time and recommend it to people looking for a pre-built sensor platform. I recently put together a Security Onion cheat sheet that highlights important information that will help you use, configure, and customize your installation.

Download the Security Onion Cheat Sheet PDF

Download the Security Onion Cheat Sheet PNG

Special thanks to Doug Burks and Phil Plantamura for reviewing this and providing valuable input. Enjoy!

5 Human-Centered Takeaways from the SANS SOC Survey

SANS recently released the results of their SOC survey that was put together by Chris Crowley. The report has a lot of useful data points and is worth your time to go through whether you’re in a SOC and wondering how you stack up against others, or if you’re thinking about establishing a SOC and need to see where the goal posts currently are.

In this post, I want to focus on five takeaways I garnered from the report*. These takeaways will revolve around the human analyst, just as all investigations do.

Heavily Regulated Industries (and vendors) are Leading the Way

Figure 1 illustrates the distribution of SOCs across specific industries. Taking dedicated cyber security and technology companies aside, the industries that appear to have a greater number of SOCs share a commonality of being heavily regulated. This includes government, finance, manufacturing, and healthcare. This seems consistent with the notion that many organizations develop their security operations by first embracing required compliance.

SOC Survey Industries Represented

By virtue of being the first and most prolific adopters of SOCs, these industries will naturally dictate best practices across the field as they mature. The common traits and mindsets predominant in these industries will influence the direction of the SOC as we know it. This will matriculate to cyber security vendors who will inevitably swap staff with practitioners in these SOCs. When combined with vendor’s focusing sales goals towards these industries ensure that vendors are also more likely to build products and produce educational materials that also promote the mindsets predominant in these fields.

A mindset is neither good or bad, and bias can be both helpful and harmful. It’s important we identify common trait distributions and mindset biases associated with these fields so that the evolution of the SOC concept benefits from a diversity of opinion.

 

SIEM as the Investigative Centerpiece

Figure 13 shows how SOC analysts correlate and analyze event data, IOCs, and other security and threat-related data. This chart essentially identifies the tool at the center of the investigative process. 77% cited the use of a SIEM for facilitating the investigation process.

In my experience, many SOCs tend to let the workflow inherent to their SIEM dictate their analyst investigation workflow. New analysts learn primarily via on the job training and through the lens of the workflow dictated by the SIEM. Given there are only a handful of widely popular and accepted SIEMs and investigative theory training isn’t widespread, most analysts currently practicing likely learned their craft via a few popular SIEMs like Arcsight, QRadar, or others. I would posit that a test could be developed wherein you could present an analyst with investigation scenarios and monitor how they solve them to arrive at an accurate assessment of which SIEM they cut their teeth on.

If the SOC doesn’t provide training in fundamental investigation concepts then an additional concern moving forward is that analysts are more likely to be “SIEM-locked” wherein they don’t know how to perform investigations without the use of a specific SIEM.  SOC managers must be certain that their SIEM supports their human-centered workflow rather than developing a workflow solely because it aligns with a SIEM. Currently, my assessment of the SIEM market is that most tools that exist don’t adequately consider or deliver workflow features that focus enough on the needs of the human analyst. It’s likely that many of the 23% of organizations identified as having built their own SIEM like tool share this opinion.

 

Investigation Metrics are Non-Existent

Collecting actionable metrics has been a pain point for most SOCs I’ve worked in or consulted with. Figure 18 describes metrics that are used, enforced, and consistently met. There are very few metrics associated with the investigation experience itself, except for the time from detection to containment and eradication. As the investigation function is the central workflow of the SOC, this continues to be an area where improvement is desired. Instead, most metrics that are considered are focused on SOC output, and not the efficiency of the SOC itself. While this is helpful for justifying the existence of the SOC (why an org spends money on the function), it isn’t as helpful for improving the SOC (reducing the cost of the function).

SOC Metrics Collected

Investigation-centric metrics might include tracking the usage of specific data sources during investigations (assists), the number of times a data source would have been helpful but was unavailable (turnovers), the most commonly aggregated fields, and average time spent viewing specific data sources. An investigation-centric metric is one that seeks to better understand how the human analyst spends their time while attempting to connect the dots in pursuit of greater speed and accuracy.

 

Internationalization of SOCs

A significant number of SOC practitioners exist outside the United States, as shown in Figure 2. However, a much smaller percentage of organizations who responded to the survey are headquartered outside the US. The disproportionate number of international analysts is likely attributed to organizations attempting to cut costs by hiring in lower income regions, and organizations that seek to staff 24×7 operations by staffing in different time zones (thereby avoiding having to hire a night shift in the US, which is notoriously difficult).

Locale of Security Operations

There are significant differences in how people think based on the culture they hail from. By nature, most Americans tend to be less sensitive to these variances and project their way of thinking onto others. As the number of international practitioners grows, it’s critical to consider the biases inherent to how Americans think so that we can identify where they may not hold up for international practitioners. As an example, people from Asian cultures tend to require more certainty about a conclusion than their American counterparts before feeling confident in it. Put more simply, someone from Kansas may view the investigation process completely differently than someone from Kazakhstan. By understanding how differing cultural mindsets impact how people approach investigations, we can draw useful data and conclusions towards a more universal investigation process.

It’s worth noting that SANS is a US-based company with a larger market penetration in the US (I don’t know this for a fact, it is an assumption). Therefore, the respondents for this survey question probably under-represent the number of international practitioners and the survey may not represent an adequate global sample. Without access to the source data, I’m unable to assert confidence regarding sample distribution. None the less, this only seeks to strengthen the points mentioned here.

 

Distributed Environments Require Unique Communication Skills

The increased number of international SOC practitioners in remote SOCs (Figure 2) and the significant number of distributed SOCS (Figure 3) stress the importance of communication in which analysts are not in the same room.

SOC Architectural Approaches

Communication is a critical function of the SOC, and it must be facilitated with appropriate tools. This stresses the importance of investigation tools that provide built-in collaboration features such as the assignment of cases, shared notes, and context tagging. It also stresses the importance of data access and information sharing via tools like wikis or knowledgebases.

Not lost here is the ability to identify and hire staff who excel at non-present communication. As someone who has managed remote teams, I quickly learned that some people simply aren’t effective communicators via text-based mechanisms like chatrooms. Managers should strive to develop strategies for identifying analysts who can excel specifically in these environments. Furthermore, if we can identify these traits and qualities, we should strive to enhance our ability to teach improvement in this communication skillset.

Conclusion

I enjoyed parsing through the SOC survey and want to thank SANS and Chris Crowley for putting it together. In the future, I’d love to see more questions relating to how human analysts perform their jobs and what pain points they have beyond just the tools they use. I’d also love to see this survey repeated on a periodic basis so that trends could be highlighted.

Finally, I’d love to see the methodology used for data collection described here and why they chose the questions they did. I appreciate SANS identifying that the research is sponsored, but citing the methodological approach would shed light on how much influence the vendors had on the questions and the interpretation of their output. A positive step would be making the raw source data publicly available for additional analysis.

I’d love to hear your thoughts on my analysis of the report, including both things you agree and disagree with. You can reach me via Twitter @chrissanders88 or you can e-mail me.

 

*Note: I was not asked to do this by SANS. This post only reflects my analysis and opinions. 

Investigation Case Management with TheHive

I’ve struggled for a long time to find a case management system that I thought fit well within the constructs of how analysts actually perform investigations. Most case management systems are actually just help desk ticketing systems that have been retrofitting to fit a security use case. This is what I see most often when SOCs are using tools like Remedy, RTIR, or OTRS. Last November, a group of researchers from CERT Banque de France (CERT BDF) released a new case management system called TheHive. The authors of the project describe TheHive as an “open source and free security incident response platform designed to make life easier for SOCs, CSIRTs, CERTs, and any information security practitioners dealing with incidents that need to be investigated and acted upon swiftly.” I would simply describe TheHive as a purpose built case management system to facilitate the investigation of security incidents. I’ve enjoyed using the TheHive so much that I actually integrated it into my Investigation Theory course where I teach people how to approach investigations and hunt down bad guys. In this post, I want to discuss a few features of the TheHive and why I enjoy it so much.

Architecture and Installation

TheHive is written in Scala and uses ElasticSearch to store and access data on the back end. The front end uses AngularJS and Bootstrap. A number of REST API endpoints are also provided to allow for integrations and bulk actions.

 

You’ll see Cortex mentioned in the diagram shown above. Cortex allows users to submit observables and indicators of compromise to popular open source intelligence tools via a series of Python-based analyzers. Ultimately, it is a separate tool with it’s own codebase, but  TheHive and Cortex go together like peas and carrots, so you’ll see them mentioned a lot in TheHive documentation. The installation command I’ll provide below will actually install both of them as a single integrated container.

There is a traditional Ubuntu 16.04 installation option described here which is probably most appropriate for production systems:  https://github.com/CERT-BDF/TheHive/wiki/Installation-guide.

If you just want to try TheHive or run it locally, you can get it running via containers with Docker. The installation process here couldn’t be simpler:

  1. Build an Ubuntu 16.04 system and ensure it’s up to date on system and software patches.
  2. Install Docker
    Info: https://docs.docker.com/engine/getstarted/step_one/#step-2-install-docker
    Command: curl -fsSL https://get.docker.com/ | sh
  3. Download and run TheHive w/ Cortex ():
    Info: https://github.com/CERT-BDF/TheHive/wiki/Docker-guide—TheHive-Cortex
    Command: docker run –publish 8080:9000 –publish 8081:9001 certbdf/thehive-cortex
  4. Connect to the web interface using a browser: http://IPofServer:8080
  5. Follow the on screen prompts to create an administrative user account

Case Management

The core construct of TheHive is the investigation case. I like this because the case is also the core construct of most security investigations, whether you’re reviewing alerts, reverse engineering malware, or working a declared incident. The case construct doesn’t provide a lot of bells and whistles, but that’s okay because I don’t think it has to. A lot of ticketing systems that are built to serve too many masters quickly become too generic to be useful. That isn’t the case here.

I particularly appreciate that you can add tags to cases for quick searching and filtering. You can also track TLP levels, which can help govern and facilitate the sharing of data. This is a nice feature that really shows how TheHive was custom built for investigation tracking. All the data you put into a case is easily searchable from the search bar at the top of the screen. This makes it really easy to determine if activity you’re currently observing was present in any earlier case.

Task Tracking

Once you’ve created a case, you can create, assign, and track tasks. A task can really be anything, but I recommend using them to track the actions taken to answer investigative questions. For example, if you’re investigating an exploit kit infection, a common question might be, “What was the system doing prior to when the alert was generated?”. To answer this, you’ll need to review evidence from whatever data source you have that will hold the answer. So, a task could be “Review HTTP Proxy data to determine what the host was doing in the 10 minutes leading up to the alert.”

In addition to answer seeking, tasks are also useful to track containment, eradication, and remediation events. You can create a task for disabling user accounts, quarantining a system, deploying an image to a system, or providing user security awareness counseling.

Tasks, like cases, have the concept of assignment. Therefore, each task can be individually assigned to an analyst for the work to be performed. By default, a task doesn’t have an owner until someone clicks into it, or “takes” it from the Waiting tasks queue in the top menu bar. This effectively creates a task queue that analysts can watch to help facilitate their work load. The queue can be filtered by any number of criteria like a specific tag, a case number, a task name, or a keyword. Tasks that are assigned specifically to you will appear in the My tasks queue in the top menu bar.

Case Templates

As a SOC evolves, it becomes critical to define playbooks that help analysts consistently approach investigations that share common attributes. For example, most the steps you take to initially investigate a series of failed passwords attempts or a phishing e-mail will generally be the same. If you can define those steps, you’ll have a great head start for training new analysts and ensuring most investigations start off on a level footing. TheHive provides a unique case template system that allows you to define common investigations and pre-populate case metadata and tasks.

In the example above, I’ve defined a template for investigations related to exploit kit activity. Now, any time I create a new case I can select this template and all the information you see there will be pre-populated into the case details. The real power here is in the ability to automatically create a series of tasks that should be completed when spawning the case. This essentially lays out the investigative playbook for you. With that, you get the added benefit of automatically populating the Waiting tasks queue so that other analysts can jump into the investigation or start completing containment and eradication tasks. This is, hands down, my favorite feature of the tool.

Collaboration

A key feature of any case management system should be collaboration, and TheHive hits the mark here. Each analyst using TheHive gets their own account which is used to log any actions they take within the tool. Users can own cases and/or tasks. One thing I particularly like is that once you create a case, virtually any action taken with it is recorded to create an audit trail. This audit trail is displayed to the right side of the individual case screens in a Twitter-style feed as seen in several of the images I’ve already shared.

Observables and Analyzers

TheHive allows you to create separate entries for interesting observables within the context of a case. An observable is any interesting data artifact, and TheHive comes with a number of common observable types built in. This includes things like IP addresses, domain names, HTTP URIs, etc. Of course, you can also define your own types which makes this capability quite flexible.

There are multiple benefits to tracking observables. The obvious one is that you can search for them during later investigations to bring in additional context. You can also export them for later import into a blacklist, whitelist, or detection mechanisms. Finally, you can use the built in Cortex integration to automatically submit the observables to any number of OSINT research sites. This is a very simple process, and primarily just requires that you input API keys for each service you’ll be using. Some of the existing integrations include Passive Total, Virus Total, and Domain Tools.

API and Integrations

Because the TheHive was built on a series of open API’s, it’s incredibly flexible in terms of integrating with other tools. The authors have produced really nice API documentation here: https://github.com/CERT-BDF/TheHive/wiki/API%20documentation. You’ll see that the documentation provides multiple examples of request formatting, along with several use cases. This includes the ability to query, create, and manipulate cases, tasks, and observables. This has immediate tangible benefits.

Consider a scenario where you’re running a signature based IDS. Any time a specific set of rules associated with exploit kit activity generates an alert, you could use the API to create a new case using a template like the one I showed earlier that is specifically designed for investigation of exploit kit related activity. Using this approach you haven’t just done a simple automation, you’ve created a workflow based on the playbooks you’ve developed. When you or another analyst go to review new alerts, anything related to exploit kits will already have a series of tasks created and waiting for you to accomplish. This is a time saver for experienced analysts, and a teaching tool for younger analysts who might not know what move to make next.

In addition to the APIs, TheHive can integrate with MISP and you can also write custom analyzers for use with Cortex. Once again, there are a lot of options here.

Conclusion

This post discussed a few of my favorite features of TheHive and how they can be used in practice. There are quite a few other features like reporting and metrics that I didn’t discuss here, so make sure to check those out on your own. A lot of tools that are used in SOCs were born in them. However, this is not an easy thing to do. Every SOC I’ve been in is different, and most of the times the tools that might come out of them won’t be nearly flexible enough to fit into the workflow that exists in another organization. The developers of TheHive have hit the delicate balance of creating a tool that is focused enough to deliver on immediate use cases, while still being broadly focused and flexible enough to be adapted to differing use cases. As stated earlier, it’s for that reason I use TheHive in my Investigation Theory course and why I’ll be recommending it for individuals who want to learn how to be analysts and for organizations that are seeking a simple case management solution that can get the job done.

You can learn more about TheHive at the project’s homepage here: https://thehive-project.org/. If you’d like to learn more about the investigation process and facilitating it with TheHive, be sure to check out the Investigation Theory course.

 

* Some of the images in this post were created from my home lab, but a few were borrowed from TheHive official documentation linked throughout the article.

 

 

 

Three Useful SOC Dashboards

I worked in security operation centers for a long time, and I really grew to hate dashboards. Most of them were specially designed pages by vendors meant to impress folks who don’t know any better when they stroll through the SOC and glance at the wall of low-end plasmas. They didn’t really help me catch bad guys any better, and worse yet, my bosses made my ensure they were always functional. Fast forward a few years, and I end up working for a vendor who builds security products. Much to my dismay, while planning for features we end up having to build these same dashboards because, despite my best efforts to persuade otherwise, CISO’s consistently ask for eye candy, even while admitting that it doesn’t have anything to do with the goal of the product. Some of them even tell us, straight up, that they won’t purchase our product if it doesn’t have eye catching visuals.

I provide that backstory to provide some insight into my long, tortuous relationship with useless dashboards. I talk about this enough at work that I feel like I’ve almost created a support group for people who have stress triggers associated with dashboards. If you’ve ever attended a conference talk from my good friend Martin Holste, you may know he hates dashboards even more than me. Alas, I’m not here just to rant. I actually believe that dashboards can be useful if they focus less on looking like video games and they help analysts do their job better. So, in this post I’m going to talk about three dashboard metrics you can collect right now that are actually useful. They won’t look pretty, but they will be effective.

Data Availability

The foundation of any investigation is rooted in asking questions, making hypotheses, and seeking answers that either disprove or prove your educated guesses. Your questioning and answer seeking with both be driven, in part, based on the data you have available. If you have PCAP data then you know you can seek answers about the context within network communication, and if you have Sysmon configured on your Windows infrastructure, you know you can look for file hashes in process execution logs.

While the existence of a data source is half the battle, the other half is retention. Some sources might have a specific time window. You might store PCAP for 3 day and flow data for 90 days, for example. Other data sources will probably use a rolling window, like most logs on Windows endpoints that are given a disk quota and roll over when that quota is met. In both cases, the ability to quickly ascertain the availability of data you have to work with is critical for an analyst. In short, if the data isn’t there, you don’t want to waste time trying to look for it. I contend that any time spent gathering data is wasted time, because the analyst should spend most of their time in the question and answer process or drawing conclusions based on data they’ve already retrieved.

A data availability section on a live dashboard helps optimize this part of the analyst workflow by providing a list of every data source and the earliest available data.

dashboard-dataavailability

In the example above I’ve created a series of tiles representing five different data types common to a lot of SOCs. Each tile boldly displays the name of the data source, and the earliest available date and time of data for it. In this example, I’ve also chosen to color code certain tiles. Data sources with a fixed retention period are green, sources with a rolling retention period based on a disk quota are yellow and red. I’ve chosen to highlight endpoint logs in red because those are not centralized and are more susceptible to a security event causing the logs to roll faster. The idea here is to relay some form of urgency into the analyst if they need to gather data from a particular source. While PCAP, flow, and firewall logs are likely to be there a few hours later, things can happen that will purge domain auth and Windows endpoint logs.

Ideally, this dashboard component is updated quickly and in an automated fashion. At minimum, someone updating this manually once a day will still save a lot of time for the individual analyst or collective group.

Open Case Status

Most SOCs use some form of case tracking or management system. While there aren’t a lot of really great options that are designed with the SOC in mind, there are things people find a way to make work like RTIR, Remedy, Archer, JIRA, and more. If integrated properly, the case management system can be a powerful tool for facilitating workflow when you assign users to cases and track states properly. This can be a tremendous tool for helping analysts organized, either through self organization or peer accountability.

 

dashboard-casestatus

In this example, I’ve gone with a simple table displaying the open cases. They are sorted and color coded by alive time, which is the time since the case was opened. As you might expect, things that have been pending for quite some time are given the more severe color as they require action. This could, of course, be built around an SLAs or internal guidelines you use for required response and closure times.

The important thing here is that this dashboard component shows the information the analysts needs to know. This provides the ability to determine what is open (case number), who they can talk to about it (owner), how serious it is (status), what it’s waiting on (pending), and how long have we known about the issue (alive).

Unsolved Mysteries

On any given day an analyst will run into things that appear to be suspicious, but for which there is no evidence to confirm that suspicion. These unsolved mysteries are usually tied to a weird external IP address or domain name, or perhaps an internal user or system. In a single analyst SOC this is easily manageable because if that analyst runs across the suspicious thing again it is likely to draw attention. That is a tougher proposition in the larger SOC however, because there is a chance that a completely different analyst is the one who runs across the suspicious entity the second time. In truth, you could have half a dozen analysts who encounter the same suspicious thing in different contexts without any of them knowing about the other persons finding. Each encounter could hold a clue that will unravel the mystery of what’s going on, but without the right way to facilitate that knowledge transfers something could be missed.

As a dashboard component,  using watch lists to spread awareness of suspicious entities is an effective strategy. To use it, analysts must  have a mechanism for adding things to a watch list, which is displayed on a screen for reference. Any time an analyst runs across something that looks suspicious but they can’t quite pin down, they first check the screen and if it’s not on there, they add it. Everything that shows up on this list is auto cycled off of it every 24-48 hours unless someone else puts it back on there.

dashboard-weirdthings

In this component, I’ve once again chosen a simple table. This provides the thing that is weird (item), who to talk to about it (observer), when it was observed in the data (date), and where you can go to find out the context of the scenario in which it was found (case) if there is any.

Conclusion

A Dashboard doesn’t have to use a fancy chart type or have lasers to be useful. In this post I described three types of information that are useful in a SOC when displayed on a shared dashboard. The goal is to use group dashboards to help analysts save time or be more efficient in their investigations. If you have the capacity to display this information, you’ll be well on your way to doing both of those things.

 

Do you have a really useful dashboard idea that you think is relevant in most SOCs? Let me know and I might blog about it down the road in a follow up.

Interested in learning more about the investigation process and how these dashboards fit in? Sign up for my mailing list to get first shot at my upcoming course focused entirely on the human aspect of security investigations.

Video: Building an NSM Lab

Building a security lab is something I get asked about really often. So often, in fact, that I decided to put some of my notes together and record a short training video on the topic. This video is only a small part of a much larger series I’m developing, so if you’re interested in learning more about that when it’s available, sign up for my mailing list.

In this one hour video I discuss the importance of an NSM lab and go through a systematic approach to building your own. I go through the following topics:

  • Analyzing your needs to define your inputs and desired outputs
  • Modeling your lab by building a list of technologies
  • The pros and cons of physical, virtual, and cloud based labs
  • Choosing the right platform for your lab
  • Designing your lab network
  • Sourcing the right hardware for your lab
  • Taking a step by step approach to designing and building the lab

Once you’re done with this video, you should have a system you can follow to build a lab that will help you test and build detection, analyze malware, and create simulations. I also provide a lot of insight to my own personal lab I use for my writing and my day job. I’ve also included some additional resources:

  • Lab planning worksheet
  • An exact parts list from my lab
  • Two example lab network diagrams
  • The network diagram for my personal lab

You can access the additional resources mentioned in the video by signing up here.