25
This Conference brought to you by www.ttcus.com @Techtrain Linkedin/Group: Technology Training Corporation www.ttcus.com Technology Training Corporation

Predictive Analytics for Defense & Intelligence - A View ... · PDF fileNeal Ziring Technical Director, Capabilities National Security Agency U/OO/800671-17 Big Data Analytics and

  • Upload
    ledat

  • View
    215

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Predictive Analytics for Defense & Intelligence - A View ... · PDF fileNeal Ziring Technical Director, Capabilities National Security Agency U/OO/800671-17 Big Data Analytics and

This Conference brought to you by www.ttcus.com

@Techtrain

Linkedin/Group:

Technology Training

Corporation www.ttcus.com

Technology Training

Corporation

Page 2: Predictive Analytics for Defense & Intelligence - A View ... · PDF fileNeal Ziring Technical Director, Capabilities National Security Agency U/OO/800671-17 Big Data Analytics and

Neal Ziring

Technical Director, Capabilities

National Security Agency

U/OO/800671-17

Big Data Analytics and Mission –

A view from NSA

4 April 2017

Page 3: Predictive Analytics for Defense & Intelligence - A View ... · PDF fileNeal Ziring Technical Director, Capabilities National Security Agency U/OO/800671-17 Big Data Analytics and

Structure

0. Introduction – importance of and taxonomy for analytics

1. Some conceptual models for analytics

2. Analytics Integration (w/ tool examples)

3. Some NSA lessons about analytics

2

Page 4: Predictive Analytics for Defense & Intelligence - A View ... · PDF fileNeal Ziring Technical Director, Capabilities National Security Agency U/OO/800671-17 Big Data Analytics and

Part 0 – Introduction

Page 5: Predictive Analytics for Defense & Intelligence - A View ... · PDF fileNeal Ziring Technical Director, Capabilities National Security Agency U/OO/800671-17 Big Data Analytics and

Why are Analytics So Important?

“We are drowning in data, but starving for knowledge!” – John Naisbitt, 1982

• Naisbitt’s sentiment is still valid today: modern IT allows easy collection and storage of data, gaining knowledge and answering analytic questions are still hard.

• Analytic computations and processes are essential for extracting useful, actionable knowledge from volumes of data. • “Big data” makes new forms of analysis possible, but collecting the right

data is more important than just collecting lots of data.

• Technologies for building and running analytic processing have improved immensely since the 1990s, but utilizing them for effective analysis still requires care, foundational skills, and understanding of the subject area.

4

Page 6: Predictive Analytics for Defense & Intelligence - A View ... · PDF fileNeal Ziring Technical Director, Capabilities National Security Agency U/OO/800671-17 Big Data Analytics and

A Simple Taxonomy for Analytics

5

• Summary & simple statistics

• Includes selection, counting, mean, range, variance, etc.

L1 Basic

• Extract relationships from multi-dimensional data, complicated statistics

• Identify simple groupings, norms, outliers,

L2 Behavioral

• Extraction of trends, correlations, models

• Generate new knowledge about datasets and their relationships

L3 Predictive

Page 7: Predictive Analytics for Defense & Intelligence - A View ... · PDF fileNeal Ziring Technical Director, Capabilities National Security Agency U/OO/800671-17 Big Data Analytics and

Further Divisions about Prediction

• What are the subjects of the prediction? • Natural activities – tend to be driven by randomness and cause-and-effect

e.g. machine failures, signal propagation, weather systems, disease progression

• Social activities – driven by human motivations/reactions, plus randomness e.g., shopping habits, stock prices, traffic congestion, pandemic spread

• Covert activities – driven by human motivations and desire to evade prediction e.g., terrorist attacks, military operations, money laundering

• How fast do you need a prediction? • Real-time – need a prediction within a fixed time interval

• Active-time – need a prediction before immediate impacts of an activity occur (amount of time involved varies for different domains)

• Non-real-time – need a prediction but time-frame is more flexible

• What certainty is necessary for predictions? • Certain – sufficient surety to take irrevocable action

• Legal – sufficient to satisfy a legal, regulatory, or compliance test

• Best-effort – sufficient to take a low-risk action 6

Page 8: Predictive Analytics for Defense & Intelligence - A View ... · PDF fileNeal Ziring Technical Director, Capabilities National Security Agency U/OO/800671-17 Big Data Analytics and

Part 1 – Conceptual Models for Analytics

Page 9: Predictive Analytics for Defense & Intelligence - A View ... · PDF fileNeal Ziring Technical Director, Capabilities National Security Agency U/OO/800671-17 Big Data Analytics and

1 – Series of Observations over Time

• Usual goals: • Simple: predictions about future observations • Complex: detecting and characterizing patterns in observations over time

• Key concerns: • Wide variety of techniques may apply • What features/properties of the observations are most important? • What aspects of future observations do you need to predict?

8

t

?

Page 10: Predictive Analytics for Defense & Intelligence - A View ... · PDF fileNeal Ziring Technical Director, Capabilities National Security Agency U/OO/800671-17 Big Data Analytics and

2 – Analyzing Entity Behavior

• Focus on entities (people, hosts, networks, services, etc.)

• Usual goals: • Define or learn clusters/bins for entities based on behavior • Build up models of entities (e.g., state transition based) to project future behavior. • Be able to predict behavior of new entities based on similarity to known entities

• Key concerns: • Identifying best features to use for the model • Dealing with missing data and noisy data

9

t

e1

e2

e3

Page 11: Predictive Analytics for Defense & Intelligence - A View ... · PDF fileNeal Ziring Technical Director, Capabilities National Security Agency U/OO/800671-17 Big Data Analytics and

3 – Pattern mining and Sequence mining

• Focus on sequences of events (connections, transactions, start/stop, etc.)

• Usual goals: • Infer or learn sub-sequences that appear often or have properties of interest • Identify missing or anomalous events • Predict future events and their times

• Key concerns: • Preventing ‘state-space explosion’ in the model • Distinguishing meaningful sequences from noise • Determining optimal features of events for extracting patterns.

10

t

Page 12: Predictive Analytics for Defense & Intelligence - A View ... · PDF fileNeal Ziring Technical Director, Capabilities National Security Agency U/OO/800671-17 Big Data Analytics and

4 – Measurements on Objects

• Focus on objects (files, programs, sessions, messages, transactions)

• Usual goals: • Group objects into classes or categories (e.g., malicious v. non-malicious) • Associate classes with features of interest, sources, or lineage • Given a new unknown object, determine the class into which it best fits • Identify outliers and anomalous objects

• Key concerns: • Identifying the most meaningful features to use to build models • Finding and maintaining a good, diverse training set of objects • Applying the model to new environments

11

Page 13: Predictive Analytics for Defense & Intelligence - A View ... · PDF fileNeal Ziring Technical Director, Capabilities National Security Agency U/OO/800671-17 Big Data Analytics and

Part 2 – Data Analytic Integration

(Realizing value depends on integrating

analytics into mission flow)

Page 14: Predictive Analytics for Defense & Intelligence - A View ... · PDF fileNeal Ziring Technical Director, Capabilities National Security Agency U/OO/800671-17 Big Data Analytics and

Rough Integration Model

• Exploration identifies new useful analytic techniques or mission applications, and those are moved to Operation. Operation identifies new mission needs to drive Exploration. Both are used to drive Acquisition.

13

Data Collection

Data Transport &

Staging

QA, Transform & Ingest

Exploration/ Characterization

Technique Development

Model Building &

Sustainment

Production Analysis

Visualization, Presentation,

Action

ACQUISITION

EXPLORATION

OPERATION

Page 15: Predictive Analytics for Defense & Intelligence - A View ... · PDF fileNeal Ziring Technical Director, Capabilities National Security Agency U/OO/800671-17 Big Data Analytics and

Basic Acquisition Stage Attributes

• Primary task: gather data necessary/useful for analysis, move it into the analytic platform(s).

14

Data Collection

Data Transport &

Staging

QA, Transform & Ingest

Stage Basic Requirements Example Tool(s)

Data Collection

• Sense data from target environment • Extract useful components

OpenDataKit, RedHawk

Transport & Staging

• Package data into aggregates • Assured transfer from point of

collection to enterprise platforms

Google® QUIC, Tsunami

QA, Transform, & Ingest

• Clean up and filter data • Transform data into consumable

format and add to repository

OpenRefine, Apache NiFiTM

Page 16: Predictive Analytics for Defense & Intelligence - A View ... · PDF fileNeal Ziring Technical Director, Capabilities National Security Agency U/OO/800671-17 Big Data Analytics and

Basic Exploration Stage Attributes

15

Exploration/ Characterization

Technique Development

• Primary task: understand your data and develop the means for extracting mission value from it.

Stage Basic Requirements Example Tool(s)

Exploration/ Characterization

• Support exploring/viewing data from multiple perspectives

• Sampling, filtering, rough display

OpenRefine, Divvy

Technique Development

• Application of multiple strategies, model types, algorithms

• Support collaborative work

Jupyter Notebooks, R language

Page 17: Predictive Analytics for Defense & Intelligence - A View ... · PDF fileNeal Ziring Technical Director, Capabilities National Security Agency U/OO/800671-17 Big Data Analytics and

Basic Operation Stage Attributes

16

Model Building &

Sustainment

Production Analysis

Visualization, Presentation,

Action

• Primary task: execute data analysis in a scalable & managed way to drive mission execution.

16

Stage Basic Requirements Example Tool(s)

Model Building & Sustainment

• Create & update analysis foundational assets (e.g. models)

TensorFlowTM, MLlib, Oryx

Production Analysis

• Perform analysis on incoming data • Create result sets to drive activities • Manage resources, prioritize

Apache SparkTM, Apache MesosTM, Apache ApexTM

Present, Visualize, Act

• Present analytic results to users • Drive mission actions from results • Push expert feedback into models

iWeave, Apache ODETM, PredictionIOTM

Page 18: Predictive Analytics for Defense & Intelligence - A View ... · PDF fileNeal Ziring Technical Director, Capabilities National Security Agency U/OO/800671-17 Big Data Analytics and

Part 3 – Some NSA Views on Analytics

Page 19: Predictive Analytics for Defense & Intelligence - A View ... · PDF fileNeal Ziring Technical Director, Capabilities National Security Agency U/OO/800671-17 Big Data Analytics and

Some Key Areas for Advanced Analytics

• Cyber defense • Compromise detection

• File triage and malware characterization

• Tradecraft analysis (see next slide)

• Empowering human analysts and operators • Analyst assistance – predict analyst needs and offer information proactively

• Create complex analytic queries from natural language text

• Language modeling

• Recommender systems • Suggest source material for analysts, reporters, operators

• Suggest jobs of interest for individuals

• Intelligence collection • Intelligence Value Estimation

• Optimize value derived from limited collection capacity

18

Page 20: Predictive Analytics for Defense & Intelligence - A View ... · PDF fileNeal Ziring Technical Director, Capabilities National Security Agency U/OO/800671-17 Big Data Analytics and

Some NSA Views on Analytics Development and Data Science

1. Data science is necessarily multi-disciplinary • To build up our data science cadre, NSA found success in establishing

mission-focused rotations through a multi-disciplinary dedicated lab (iCafe).

• Collaboration between programmers, mathematicians, platform experts, and domain experts is very powerful – each learns from the others.

• Drive analytics work with access to real data and real mission problems.

2. Data volume is often less of a problem than data speed. • Defense and intelligence often require analytic answers fast.

• Use deep, batch analysis to build models, use streaming analysis to apply them to incoming data (see next slide)

3. Effective data science requires strong foundations. • Basic statistics and data mining

• Core computer science

• Understanding of the problem domain

19

Page 21: Predictive Analytics for Defense & Intelligence - A View ... · PDF fileNeal Ziring Technical Director, Capabilities National Security Agency U/OO/800671-17 Big Data Analytics and

Example: Hybrid batch/streaming analytics

20

Event data store

streaming

analytic

platform

events

analytic

platform

action

Machine-learning analytic

feedback

analyst

model

results

results

Page 22: Predictive Analytics for Defense & Intelligence - A View ... · PDF fileNeal Ziring Technical Director, Capabilities National Security Agency U/OO/800671-17 Big Data Analytics and

Wrap-up

Page 23: Predictive Analytics for Defense & Intelligence - A View ... · PDF fileNeal Ziring Technical Director, Capabilities National Security Agency U/OO/800671-17 Big Data Analytics and

Conclusions

• Big data analytics allow us to answer analytic questions and guide operations in new ways. • Finding subtle, unexpected, actionable relationships

• Extracting very-deeply-buried knowledge

• Modeling very complex behaviors

• There are many ways to view analytics • Begin with exploratory analysis, try & compare different approaches

• Understand the mission need that the analytic will address

• Make your analytics only as complex as they need to be

• To drive mission value, analytic strategy must span your enterprise • Every stage matters, from initial collection to final presentation and action

• Many tools and products are available for every step, choose ones that fit your situation

• Powerful analytics can commit powerful mistakes if misapplied • Don’t apply analytic techniques/algorithms/packages blindly.

• Build multi-disciplinary teams with math, computing, and domain expertise

• Validate analytics before promoting them to production status 22

Page 24: Predictive Analytics for Defense & Intelligence - A View ... · PDF fileNeal Ziring Technical Director, Capabilities National Security Agency U/OO/800671-17 Big Data Analytics and

Backup Slides

Page 25: Predictive Analytics for Defense & Intelligence - A View ... · PDF fileNeal Ziring Technical Director, Capabilities National Security Agency U/OO/800671-17 Big Data Analytics and

End Notes

• Google® is a registered trademark of Google Inc.

• Apache NiFi™ is a trademark of The Apache Software Foundation

• TensorFlow™ is a trademark of Google Inc.

• Apache Spark™, Mesos ™, Apex ™, ODE™, and PredictionIO™ are trademarks of The Apache Software Foundation

• iWeave™ is a trademark of Campbell, Steven I.

24 U/OO/800671-17