10
White Paper: Cyber Security and Neuromorphic Computing Neuromorphic Cyber Systems David Follett, Founder/CEO Lewis Rhodes Labs [email protected] Introduction Cyber defense requires exceedingly complex, analyst-intensive systems that must operate under constant siege. Improvements necessitate a solution approach that accounts for architecture, people, and processes. Tis paper examines cyber defense priorities from the analyst perspective, derives requirements from these priorities, and considers various technical approaches to advance system performance through improved sensor design. 1 System Level Cyber Architecture At scale, cyber defense is a three-stage process. First, intrusion detection systems (IDSs) monitor network and system activity for malicious events. IDSs use hardware and software sensors to generate alerts based on deployed security policies. Policies contain cyber event descriptions typically in the form of Snort rules or Perl Compatible Regular Expressions (PCRE) that are constructed by analysts using forensic analysis of attacks. Second, alerts are consolidated into an Operational Intelligence database system, such as Splunk. Tis database provides a consolidated view of potential threats and bufers the analysts from the alert generation process. Finally, highly trained security analysts mine the alert database using experience and a variety of analytic tools to identify and neutralize threats. Tese tools often utilize behavioral techniques such as “kill chains”. 1 Figure 1 is a schematic representation of a state-of-the-art cyber defense architecture. 2 Analyst’s Perspective From the perspective of a cyber analyst the architecture in Figure 1 has two fundamental weaknesses. First, the fdelity of the sensor-generated alert data is excruciatingly low, placing a huge burden on the Operational Intelligence database and analyst community that uses it. Second, the system fails to identify classes of false positives because it’s unable to evaluate common predicted behaviors, slow in adapting to evolving threats, and too expensive to support rich sensor felds. We will explore each of these in some detail as they drive the requirements. 1 http://www.lockheedmartin.com/content/dam/lockheed/data/corporate/documents/LM-White-Paper- Intel-Driven-Defense.pdf 1

Cyber Security and Neuromorphic Computing - Lewis …€¦ · Cyber Security and Neuromorphic Computing ... Managing alerts and mining the data to extract ... data mining utilities

  • Upload
    vomien

  • View
    239

  • Download
    0

Embed Size (px)

Citation preview

White Paper: Cyber Security and Neuromorphic Computing

Neuromorphic Cyber Systems

David Follett, Founder/CEO Lewis Rhodes Labs

[email protected]

Introduction Cyber defense requires exceedingly complex, analyst-intensive systems that must operate under constant siege. Improvements necessitate a solution approach that accounts for architecture, people, and processes. This paper examines cyber defense priorities from the analyst perspective, derives requirements from these priorities, and considers various technical approaches to advance system performance through improved sensor design.

1 System Level Cyber Architecture

At scale, cyber defense is a three-stage process. First, intrusion detection systems (IDSs) monitor network and system activity for malicious events. IDSs use hardware and software sensors to generate alerts based on deployed security policies. Policies contain cyber event descriptions typically in the form of Snort rules or Perl Compatible Regular Expressions (PCRE) that are constructed by analysts using forensic analysis of attacks. Second, alerts are consolidated into an Operational Intelligence database system, such as Splunk. This database provides a consolidated view of potential threats and buffers the analysts from the alert generation process. Finally, highly trained security analysts mine the alert database using experience and a variety of analytic tools to identify and neutralize threats. These tools often utilize behavioral techniques such as “kill chains”.1 Figure 1 is a schematic representation of a state-of-the-art cyber defense architecture.

2 Analyst’s Perspective

From the perspective of a cyber analyst the architecture in Figure 1 has two fundamental weaknesses. First, the fidelity of the sensor-generated alert data is excruciatingly low, placing a huge burden on the Operational Intelligence database and analyst community that uses it.

Second, the system fails to identify classes of false positives because it’s unable to evaluate common predicted behaviors, slow in adapting to evolving threats, and too expensive to support rich sensor fields. We will explore each of these in some detail as they drive the requirements.

1http://www.lockheedmartin.com/content/dam/lockheed/data/corporate/documents/LM-White-Paper­Intel-Driven-Defense.pdf

1

Figure 1: System Level Cyber Architecture The increasing scale and sophistication of attacks is eroding the efficacy of this architecture.

2.1 Alert Fidelity

Fidelity is the ratio of “true positive” to “false positive” alerts generated by the sensors, aka. signal to noise ratio. In an ideal world, the Operational Intelligence database contains only true positive alerts allowing analysts to focus on behavioral analysis, forensics, and response. In reality, the fidelity of the alert data is extremely low. Isolating and prioritizing the true positives from the noise in the Operational Intelligence database is an error-prone, resource-intensive task. Improving alert fidelity directly impacts mission effectiveness and is the community’s top priority.

Poor alert fidelity has multiple causes, including event articulation, cost related coverage limitations, and vulnerability of sensors to direct cyber attack.

2.1.1 Event Articulation

Multiple factors limit alert articulation including regular expression (RegEx) language deficiencies, challenges managing large RegEx groups, poor alert consolidation/correlation, and computational limitations.

2.1.1.1 RegEx Limitations

In 1997 Jamie Zawinski famously blogged, “Some people, when confronted with a problem think ‘I know, I’ll use regular expressions.’ Now they have two problems.” Regular expressions are widely used to describe cyber events, even though they lack abstraction and hierarchical constructs. As a result they are not amenable to the development tools and processes required to express complex automata. Regular expressions rarely exceed 50 characters in length, as they quickly become too complex to construct and validate. Their strength is identifying well-characterized, relatively simple sequences.

2

2.1.1.2 Large Regular Expression Groups

It is difficult to choose and manage collections of expressions. Regular expressions lack the features required for effective system level optimizations. Using the open source Hyperscan2

expression engine Lewis Rhodes Labs (LRL) evaluated 2,048 random PCRE expressions from the public emerging threats database3 against a public PCAP file4 containing 15 MB of data. This test generated 16.8M alerts in block mode and 31.5M in streaming mode (ie. 2 alerts/byte). The vast majority of these alerts were duplicates and false positives resulting from expression overlap and poor specificity. While techniques exist for representing complex sets of regular expressions as a consolidated graph, effective mechanisms don’t exist for reducing overlap and enhancing the specificity. Large groups of regular expressions generate large quantities of low-fidelity output that’s hard to characterize or control.

2.1.1.3 Sensor Level Alert Consolidation/Correlation

Regular expressions lack attributes for easy consolidation into concise representations. As a result Operational Intelligence databases are typically bloated with low-value alerts. To establish situational awareness an analyst must mine the data, a resource-intensive, slow process. For example, sensor flooding attacks produce millions of alerts that collectively indicate a nefarious event may be under way. Similarly, event correlation is more indicative of an attack than individual alerts. Expressing correlated events resulting from large expression sets as enriched alerts is challenging. Managing alerts and mining the data to extract actionable events in an Operational Intelligence database is expensive and slow. Low-fidelity sensors further exacerbate this problem.

2.1.1.4 Computational Limitations

Computational performance, typically measured as throughput, is a function of the regular expressions, processing algorithm(s), and data stream(s), coupled with the hardware and software technology being utilized. It is both difficult and expensive to design sensors and policies for worst-case conditions, a weakness easily exploited by attackers. There are no effective techniques for definitively characterizing the performance of a policy against an arbitrary data stream on a traditional processor. Flooding attacks expressly exploit computational limitations to saturate sensors leading to attack opportunities.

2.1.2 Cost Limitations

Hardware sensors, such as the network security appliance shown in Figure 1, are extremely expensive on the order of $100k/10GbE link. There are a number of contributors to the high cost, but the result is that appliances are deployed sparingly with policies optimized to ensure critical alerts are not lost. As a result, analysts favor fidelity rules on the theory you can not detect what you can not see. More bluntly, analysts utilize low-fidelity by design to increase the likelihood of detecting true positives at the expense of dramatically larger numbers of false positives.

A similar issue exists for software sensors. There is a general tradeoff between the fidelity of a software sensor and the processing required to run it. A software mechanism for detecting Return Oriented Programming attacks was recently published that consumed >90% of the system it was protecting.

2https://01.org/hyperscan 3http://doc.emergingthreats.net/ 4http://pen-testing.sans.org/holiday-challenge/2013 3

2.1.3 Sensor Vulnerability

Both hardware and software sensors are vulnerable to flooding attacks designed to blind the analyst. The goal is twofold. First, the attacker wants to disrupt the sensor’s ability to monitor activity enabling exploits to avoid detection. Second, the flood of alerts degrades the fidelity of the data in the Operational Intelligence database (see sect 2.1.1.3) further complicating the analyst’s task. This problem is exacerbated by the system coverage limitations due to cost discussed above.

Software sensors are also vulnerable to spoofing. Spoofing alters the sensor or its environment such that it is blinded to intents of the attacker. Stuxnet5 is an excellent example of a spoofing attack.

2.2 Behavioral Analysis

Behavioral analysis is fundamental to modern cyber defense. Attack vectors are evolutionary, with most attackers leveraging rapid change to avoid detection. However, change establishes behavior patterns that are valuable in identifying novel exploits. Behavioral analytics lie almost exclusively in the interaction between analysts and the Operational Intelligence database [Fig 1]. This is a resource and personnel intensive process, especially considering the low fidelity of the alert database. Leveraging behavioral techniques requires the ability to adapt rapidly and preemptively evaluate predicted variance. Three techniques that present specific challenges to sensors will be considered: time variance, order variance and dynamic reconfiguration.

2.2.1 Time Variance

For performance reasons inherent in existing hardware and software architectures, security rules typically include “directives” that limit their application at the expense of increased false negatives. In a simple example, if an attack sequence starts at a 12-byte offset into the header, the attacker can shift to a 57 byte offset to avoid detection. While the behavior is easily predicted the cost of evaluating regular expressions against all possible time variants is often prohibitive.

2.2.2 Order Variance

Changing the order of events significantly complicates detection. From an attacker’s perspec­tive, this can be as simple as reordering software modules or changing compiler parameters. Constructing order-invariant regular expressions is challenging. Instead of searching for any sequence within the exploit the defender must assess relative similarity, a vastly more difficult problem.

2.2.3 Dynamic Reconfiguration

Historically, hardware sensors lack the ability to dynamically adapt to evolving threats. Small policy changes typically require recompilation of the full regular expression set before deployment. The ability to modify policy in real time is increasingly critical.

2.2.4 In Conclusion

The size and complexity of the Operational Intelligence system can be daunting [Fig 1]. Low-cost, high-performance regular expression processing makes this problem much worse. This issue must be addressed at the system level.

45https://en.wikipedia.org/wiki/Stuxnet

3 Security Applications

When introduced 21 years ago, Infiniband host bus adapters (HBAs) were 100x higher perform­ance then alternate solutions. Initially applications realized limited benefit and in some cases became unstable as system architects at the time had not envisioned network components with this level of performance. After 4 years and considerable engineering, data base performance gains of 40% were typical. In practice, a 10x faster HBA would have yielded the same 40% result. Infiniband’s real value derived from its redefinition of the network API and subsequent system level transformations it enabled. Similarly, technologies such as Snort6, Suricata7, Splunk8, and WaterSlide9 were each developed within their own user community contexts and their developers made assumptions about the target environments. Each technology reflects these assumptions and the system architectures built with them typically inherit the strengths and weaknesses. Benchmarking at the component level is interesting to engineers, but customer value remains at the system level.

System-level cyber defense performance is a function of individual applications, dynamics between these applications, underlying infrastructure, processes, and analysts. We will briefly discuss examples of open-source IDS appliances, Operational Intelligence databases, and streaming analytics software.

Snort and Suricata: Snort is an open-source IDS that deploys as a standalone network appliance, usually on commodity hardware. Snort is well established in commercial, academic, and government environments. LRL’s customers primarily use Snort as a sensor technology to generate network alerts for the Operational Intelligence database shown in Figure 1. Snort’s data structures, control path, and single-threaded design make it incompatible with high-performance pattern matchers. A functionally similar open source IDS, Suricata, addresses a multitude of Snort performance issues. Although Suricata is more efficient, its use of triage is poorly suited to fast regular expression engines. A new version of Snort, Snort ++, is in early alpha testing.

Splunk: Splunk is a commercially-available Operational Intelligence database with associated data mining utilities. In cyber deployments Splunk is strategically located between the sensors and the analysts [Fig. 1]. Sensor data is consumed by Splunk and analysts utilize its tools, APIs, and user interfaces to manipulate and mine the data. The complexity of storage and analysis grows rapidly with the volume of alerts. Data enrichment at the sensor level is highly leveraged and a requirement in more complex systems.

WaterSlide: WaterSlide is an open source streaming analytics program9 that is a subset of a proprietary application called Quiz Kid. In sharp contrast to integrated, security-specific applications such as Snort, WaterSlide decomposes streaming analytics operations into a family of lower level functions called “kids”. Complex functionality is achieved by assembling kids into an analyst-defined data stream. Key attributes include an extremely flexible, general-purpose architecture, scalability from single instance servers to high-performance distributed systems, robust management of stream saturation, and the ability to create custom kids. Kids for the Lewis Rhodes Labs Neuromorphic Data Microscope and Google’s RE210 high performance pattern matchers are in the public domain.

6www.snort.org 7https://suricata-ids.org/ 8www.splunk.com 9https://github.com/waterslideLTS 10https://github.com/google/re2

5

4 Analyst Requirements

The top priority of the analyst community is alert fidelity, followed by the integration of behavioral capabilities at the sensor. This section expands these priorities into requirements that are compatible with existing architectures and processes. The requirements focus on sensor performance and do not directly address back-end infrastructure, algorithms, or processes.

Before expanding on the requirements, it is important to consider the following practical limitations:

1. Almost all alerts are defined using regular expressions. These alerts represent an enor­mous investment in people, tools, and processes. This investment cannot be discarded and will take years to change. Solutions must leverage the existing investment.

2. Operational intelligence databases, the associated infrastructure, and software tools are extensive. Substantially altering or increasing the size of these systems is not an option.

3. Changes to security architecture, infrastructure, and processes often introduce unexpected vulnerabilities. New technology deployment requires considerable planning and care.

The following specific requirements are derived from the priorities discussed in this paper:

1. Cost per alert: Event detection is very expensive in terms of dollars, power, size, and weight. Cost limits expression fidelity, the number of expressions, and the breadth of sensors that can be deployed. All aspects of sensor cost must be reduced to increase infrastructure visibility and alert fidelity.

2. Behavioral algorithms: Behavioral algorithms at the sensor level automatically detect common attack variants and reduce false negatives, response time, and infrastructure cost.

3. Abstraction: Abstraction replaces large volumes of low-fidelity alerts with a more concise set of higher-fidelity alerts. Abstraction enriches and reduces the number of alerts in the Operational Intelligence database.

4. Consolidation: Consolidation allows repetitive or similar alerts to be condensed, enhancing overall fidelity while lowering cost and response time.

5. Correlation: Considering the relative context of alerts enables classes of correlated results to be merged at the sensor enriching the content of the Operational Intelligence database.

6. Determinism: Determinism hardens sensors to flooding attacks.

7. Dynamic reconfiguration: Dynamic reconfiguration enables automated systems to optimize policy in real time based on attacker behavior.

8. Robustness: Improvement will require a shift to hardware sensors.

6

At a system level these requirements are not independent. If 1 and 2 are implemented without some combination of 3-5, the Operational Intelligence database becomes unmanageable. Table 1 summarizes the expected impact of these requirements on alert fidelity, behavioral (false negative) performance, and Operational Intelligence database size.

Table 1

5 Pattern Matching Engines

Many regular expression pattern-matching engines exist11. This paper discusses the following four approaches: 1) libpcre12, a widely deployed full featured implementation, 2) Google’s RE2, an excellent implementation of classic high performance algorithms, 3) Hyperscan, which represents the pinnacle of processor-optimized expression engines, and 4) LRL’s Neuromorphic Data Microscope, which supports PCRE as a transitional language for more advanced neuromorphic pattern-matching capabilities.

Libpcre: Libpcre is an open-source library in active use by many of LRL’s customers and security solutions. While the performance is poor it supports constructs such as back reference, look ahead, look behind, recursion, and conditionals. These constructs are not compatible with known high performance regular expression engines. Libpcre requires a Turing-complete engine and does not support parallel examination of expressions.

RE2: RE2 was developed by Russ Cox at Google. Russ has written a number of documents on RE2 and pattern matching in general13. There are two core concepts behind RE2. First, RE2’s performance derives from Ken Thompson’s 1968 paper on high-performance regular expression matching. Second, RE2 addresses the “state explosion” challenge by compiling groups of expressions into an intermediate representation that cleverly supports rapid, lazy construction of relevant sub-segments. As RE2 evaluates data streams, it dynamically builds just the portion of the automaton required for the analysis. When the system’s cache is full it flushes and continues.

11https://en.wikipedia.org/wiki/Comparison_of_regular_expression_engines 12http://pcre.org/ 13https://swtch.com/~rsc/regexp/

7

6

Hyperscan: Hyperscan was developed by Sensory Networks which was later acquired by Intel Corporation. Hyperscan is the pinnacle of optimization for Intel architecture CPUs. Hyperscan takes into account available algorithms, cores, threads, instruction sets, and cache characteristics to automatically optimize a final execution image for the target system. The result is quite sophisticated. Unlike RE2, it utilizes many different techniques to trade off CPU versus memory versus fidelity. For example, Hyperscan can be configured to increase throughput at the cost of false positives.

Neuromorphic Data Microscope: LRL’s Neuromorphic Data Microscope is a processor that includes key architectural elements from the sensory cortex of the brain. As a result it is power efficient, supports temporal computation, is very good at data abstraction, and has behavioral characteristics including statistical thresholds. The flexibility of the design facilitates a compiler that maps PCRE to the underlying neuromorphic language. It is not a dedicated regular expression engine like Hyperscan or RE2. Current implementations are FPGA based but the design is easily reduced to an ASIC.

Appendix A contains a summary table comparing Libpcre, RE2, Hyperscan and the Neuromorphic Data Microscope. This table does not reflect the many caveats associated with each of these technologies.

Analyst Vision

The resulting vision of the cyber analyst community is shown in Figure 2. This architecture enables the following system level attributes:

1. Visibility: Changing the cost dynamics enables sensor deployment at all critical infrastructure junctions. Enhancements to sensor function enable significant improvements in the fidelity of alerts available to analysts.

2. Behavioral Detection: Deploying behavioral characterization at the sensor level enables the system to rapidly detect anticipated behavior. This capability automates key elements of what is currently an analyst-intensive, high-latency process.

3. Dynamic Response: Dynamic sensors enable policies that automatically adapt to changing tactics. This capability allows the system to optimize visibility into threats, reducing the attack surface.

4. Robustness: Replacing software sensors with low-cost hardware sensors eliminates a key vulnerability in existing infrastructure. Hardware sensors allow command and control to be moved out-of-band, further hardening the system.

This vision represents a quantum leap over current state-of-the-art.

8

7

Figure 2: Analyst Vision Addressing analyst priorities with Neuromorphic Processing Units (NPUs), such as the

Neuromorphic Data Microscope, dramatically improves the visibility, behavioral dynamics, dynamic response, and robustness of cyber defense systems

Conclusions

When viewed from the perspective of an analyst, cyber defense priorities shift significantly. Component specifications quickly disappear behind system-level attributes such as alert fidelity and distributed behavioral analysis. Top-down analysis decomposes these attributes into language, function, and cost limitations at the sensor level. Transitioning cyber security to the next generation requires addressing the full breadth of these issues in a manner that is compatible with existing infrastructure and processes. Neuromorphic processors are well-suited to this task.

9

Appendix A14

p eLibpcrre Hyperscan o DaNeur oscopeata Micr

enceseferr

14Data partially extracted from, https://en.wikipedia.org/wiki/Comparison_of_regular_expression_engines

15Comparing regular expression engines is extremely complex. The above table is intended to give a general sense of relative capabilities.

16Relative performance of WaterSlide application running “vector match RE2 kid” vs “Neuromorphic Data Microscope kid” on a Intel Xeon E5-1650 v3 (6 core HT, 15MB cache, 3.5GHz) 583 PCRE expressions.

©2017 Lewis Rhodes Labs 032417

Lewis Rhodes Labs [email protected] 66 Commonwealth Ave., Suite 2 978-795-4008 West Concord, MA 01742

10