5
Page 1 The Challenge of Discovering Unknown Threats White Paper: Examining the security data lake and how it integrates into your big data analytics strategy. BIG DATA

The Challenge of Discovering Unknown Threats · Due to the continuing rise of breaches more companies are turning to another type of security product to discover unknown threats –

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The Challenge of Discovering Unknown Threats · Due to the continuing rise of breaches more companies are turning to another type of security product to discover unknown threats –

Page 1

The Challenge of Discovering Unknown Threats

White Paper: Examining the security data lake and how it integrates into your big data analytics strategy.

BIG DATA

Page 2: The Challenge of Discovering Unknown Threats · Due to the continuing rise of breaches more companies are turning to another type of security product to discover unknown threats –

Page 2

White Paper:THE CHALLENGE OF DISCOVERING UNKNOWN THREATSDue to the continuing rise of breaches more companies are turning to another type of security product to discover unknown threats – the security data lake. It is important to understand what this product is, what types of features to look for, and how it integrates into the rest of your security arsenal.

TABLE OF CONTENTS:

1. Today’s Security Situation ............................................................................ 3

2. What is a Security Data Lake? ...................................................................3

Addressing the 4 Vs ............................................................................ 3

Not a Replacement for SIEM ........................................................ 4

3. Serving New SecurityNeeds ..................................................................... 4

4. A Turnkey Open-Source Solution .......................................................... 5

BIG DATA

Page 3: The Challenge of Discovering Unknown Threats · Due to the continuing rise of breaches more companies are turning to another type of security product to discover unknown threats –

Page 3

TODAY’S SECURITY SITUATIONDespite the growing variety of tools used in today’s cybersecurity operation centers, we are still seeing threats make their way past all our defenses. Many newer techniques (like sandboxing) were effective when they first hit the market, but within a year agile adversaries have learned how to circumvent them, and so the trend continues.

Tools like SIEMs and anti-virus look for known threats, using signatures that are available through proprietary or commercially available threat feeds. SIEMs attempt to detect unknown threats using a variety of behavioral rules that have proven to be indicators of compromise. However, SIEMs have fallen short on delivering accurate and complete results for two main reasons. First they are designed to analyze limited data for short periods of time, and second they do not facilitate advanced analytics, nor do they generally provide direct access to the data for data science. Most recently, industry experts and practitioners believe that accuracy lies in collecting more and more data while retaining it for longer periods of time—sometimes for years.

Correspondingly, organizations are turning to advanced security analytics operating on a security data lake, the latest “new” defense. Employing a security data lake is an attempt to accelerate an organization’s ability to quickly detect and pinpoint if their networks have been compromised so that appropriate countermeasures can be deployed.

These type of big-data solutions offer key capabilities including:• Advance statistical algorithms which can

find indicators of compromise based on general anomalies, without knowing beforehand what the anomaly is.

• Inclusion of a new role, the security intelligence analyst, who can analyze data

and learn to continually adjust the algorithms over time thereby keeping ahead of adversaries.

• Effective, efficient and economic investment in a strategic data store that can be seamlessly integrated into existing data repositories and can be used, not only for security purposes, but also for compliance reporting, forensics investigations, root cause analysis, etc.

WHAT IS A SECURITY DATA LAKE?A security data lake is a central repository of all information originating from a variety of servers, endpoints, databases, sub-systems and devices, all of which exist in a typical IT infrastructure. Unlike traditional structured databases or warehouses which parse, organize and categorize all the information it stores, a data lake will simply load all the data in its natural form, structured or unstructured. Traditional databases can be likened to a grocery store where water is bottled, labeled, and put on a shelf in a specific aisle. A data lake however, simply stores the “bulk” water in the most economical way possible.

Addressing the 4 VsClearly, storing massive amounts of information from diverse systems throughout the IT infrastructure results in an immense amount of data. This creates some practical issues to be addressed as the security data lake is constructed so that the power of the data can be effectively harnessed. These issues or challenges are commonly known as “the four Vs”.

They are:Volume: To solve the data volume challenge the system must scale beyond the petabyte in a linear fashion. It also must do so economically, without traditional software licensing fees. Popularity and recent maturing

Page 4: The Challenge of Discovering Unknown Threats · Due to the continuing rise of breaches more companies are turning to another type of security product to discover unknown threats –

Page 4

of open-source products such as Hadoop and its ecosystem components has led to more and more deployments of open-source solutions in the enterprise.

Variety: IT infrastructure data has many different formats, which seem to change every time systems are updated or new systems are added. To solve this challenge, raw data must be stored before it is parsed, organized and categorized.

Velocity: To ingest data at the scale and diversity noted above, the system must be capable of balancing the load or compute power, eliminating performance bottlenecks. For this you need to ensure you have in-memory processing, deployable as a self-balancing cluster, and again without the high cost of software licensing fees.

Veracity: The storage of data must be such that the veracity of the information is ensured and verifiable. Data that is parsed, organized and categorized may take on additional meaning after the parsing has occurred, making it possible that erroneous interpretation may occur. Users should always have the capability to “drill down” from the analyzed data to the original raw data so any machine-level interpretation can be verified.

Not a Replacement for Your SIEMBoth data lakes and SIEMs need to process similar data, although at vastly different scale. Originally vendors sold SIEM solutions as a single-pane-of-glass (SPOG) to help consolidate alerts and data visualization into one convenient place. More recently SIEMs have been sold as log management solutions, but SIEMs perform poorly when digesting larger amounts of data and licensing models can make them prohibitively expense. Most SIEMs use traditional standard relational databases, which generally have severe scalability limitations, and do not support advanced analytics or allow direct access to the data.

While SIEMs fall short when ingesting massive quantities of log information, there are things that SIEMs do well including:• Real-time correlation and alerting

• A user interface to create and manage the rules

• A user interface to manage the alerts

• Event triage and workflow engines

• Ticketing and case management functionality

Things SIEMs do not do well include:

• Long retention periods, more than a year in most cases and over 10 years in some cases

• Sharing of data with other systems in a standard way

• Advanced statistical analysis of large data sets

SERVING NEW SECURITY NEEDSNew open-source big data technologies have made it easier and more affordable for organizations obtain big data solutions for the data scientist. While most cyber defense tools are aimed at the SOC analyst, security data lakes facilitate the new role of the security intelligence analyst. These two roles are analogous to the fire fighter and the fire detective. The fire fighter (SOC analyst) always operates in one timeframe—immediately! When an alert goes off an attack or breach may be in progress. The fire fighter needs precise instructions to put out the fire without a complicated decision process. The fire detective (security intelligence analyst) on the other hand, is called upon after the incident; he is unhurried, methodical, and considers every possibility. The fire detective must consider the root cause, and how can the incident be avoided in the future.

Page 5: The Challenge of Discovering Unknown Threats · Due to the continuing rise of breaches more companies are turning to another type of security product to discover unknown threats –

Page 5

With big data capabilities it’s possible to leverage advanced statistical algorithms to analyze the data and establish baselines of “normal” behavior for anything that interacts with the network. Baselines are formed over weeks, months and even years and when established, can then detect in real-time when “abnormal” behavior occurs. Each environment is different; the role of security intelligence analyst becomes extremely important. Security data analysts or the modern-day security intelligence analyst, may use data science to discover new correlations between events, attributes of events, assets, and users involved. This brings a new capability to cyber defense that looks at information in a way that no other cyber defense tool is currently doing. The role of the security data analyst is a long-term role; as bad actors continue to discover new ways to hide their activities the fire detectives need to continually adjust their algorithms in order to identify this abnormal behavior.

A TURNKEY OPEN-SOURCE SOLUTIONSSTech offers turnkey Big Data solutions that leverage best-of-breed open-source technologies. An example is our advanced Insider Threat Detection (ITD) solution. This offering is the result of years of industry expertise working with extremely large data sets of cybersecurity information and data modeling. SSTech Big Data’s ITD solution monitors 9 different indicators of user behavior in the categories of email activity, web browsing activity, and system logins. Our algorithms comb through daily user activity, determining the baseline score for every user in two dimensions – comparing each user’s activity to their own history, and to the community of all users. The solution then combines all the indicators into a single risk

score for each user, with the riskiest users being those most likely carrying out insider activity. The ITD solution employs supervised machine learning with a feedback loop. The results of investigations are then fed back into the system so it gets smarter as it gains more experience.

An SSTech ITD client’s system is accessible to customers so they can see the logic employed, and even modify it if required. Entirely new algorithms can be imported to analyze the same data set and Elysium plans on adding new algorithm data moving forward.

Is Open Source the Way to Go?With SSTech Big Data open-source is both affordable and customizable. Big data open-source technologies can be a lot of work, with so many products to investigate and select, and then stitch together to work cohesively. Different vendors provide various levels of support.

SSTech offers you the best option. We provide a pre-assembled accelerator toolkit to get you up and running in a fraction of the time. We have the big data experts and we’ve made default selections for the entire solution. Yet we can also be flexible and leverage any existing big data technology you may already have in- house. And we back the entire solution as your first tier of support for all the components involved.

You can harness the power of big data and open-source technologies while ensuring measurable results for your business.

CONTACT USEmail Brad Kekst, Director Business Development, Big Data Solutions, at [email protected] for help determining the right big data project for your business.

Regional offices in Herndon, VA; Atlanta, GA; Dallas, TX; Santa Clara, CA; and Toronto, CanadaCorporate Headquarters in Tampa, FL.