12
Test and Evaluation/Science and Technology Program Rapid Data Analyzer for Net - Centric Systems Test (RDAN ) 33 rd Annual International Test and Evaluation Symposium October 4, 2016 Mr. Andrew Shaffer (Technical Lead) Applied Research Laboratory, The Pennsylvania State University This project was funded by the Test Resource Management Center (TRMC) Test and Evaluation/Science & Technology (T&E/S&T) Program through the U.S. Army Program Executive Office for Simulation, Training, and Instrumentation (PEO STRI) under Contract No. W900KK-13-C-0015. Distribution Statement A. Approved for public release; distribution is unlimited.

PowerPoint Presentation Symposium/2016...Title PowerPoint Presentation Author Andrew P. Shaffer Created Date 10/9/2016 12:54:20 PM

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: PowerPoint Presentation Symposium/2016...Title PowerPoint Presentation Author Andrew P. Shaffer Created Date 10/9/2016 12:54:20 PM

Test and Evaluation/Science and Technology Program

Rapid Data Analyzer for Net-Centric

Systems Test (RDAN)

33rd Annual International Test and Evaluation Symposium

October 4, 2016

Mr. Andrew Shaffer (Technical Lead)

Applied Research Laboratory, The Pennsylvania State University

This project was funded by the Test Resource Management Center

(TRMC) Test and Evaluation/Science & Technology (T&E/S&T)

Program through the U.S. Army Program Executive Office for

Simulation, Training, and Instrumentation (PEO STRI) under

Contract No. W900KK-13-C-0015.

Distribution Statement A. Approved for public release; distribution is unlimited.

Page 2: PowerPoint Presentation Symposium/2016...Title PowerPoint Presentation Author Andrew P. Shaffer Created Date 10/9/2016 12:54:20 PM

Outline

2Distribution Statement A. Approved for public release; distribution is unlimited.

• RDAN System Overview

• Test & Evaluation Need

• Tools & Background

• RDAN System Architecture

• RDAN Functional Operators

• Key Science and Technology Innovations

• Innovations in the RDAN Architecture

• Potential Use Cases & Applications

• Summary and Future Work

• Points of Contact

Page 3: PowerPoint Presentation Symposium/2016...Title PowerPoint Presentation Author Andrew P. Shaffer Created Date 10/9/2016 12:54:20 PM

RDAN System Overview

RDAN applies automated analysis using cloud computing

technologies to reduce the time from Data to Decision

• RDAN collects and analyzes high-

volume data from multiple data

sources- Unstructured text (TRL 5)

- Structured data (TRL 5)

- Image & video (TRL 4)

- Voice - Future

- Test & Training Enabling

Architecture (TENA) - Future

• RDAN automatically analyzes and indexes data using cloud technologies to

support rapid search and analysis operations on large data sets

• RDAN provides custom parallel algorithms and architecture to reduce time

from Data to Decision from hours/days/weeks to seconds

3Distribution Statement A. Approved for public release; distribution is unlimited.

Page 4: PowerPoint Presentation Symposium/2016...Title PowerPoint Presentation Author Andrew P. Shaffer Created Date 10/9/2016 12:54:20 PM

Diverse data

formats collected

using separate

T&E systems

Modern distributed T&E events generate

large amounts of unstructured data that

are hard to analyze

Current analysis

techniques

Data to

Decision

extremely slow

(hrs/days/wks)

High volume data

Decision

• High Volume Data Collection

– Unable to efficiently collect and analyze large

unstructured data (i.e. text, voice, chat,

image/video) and structured data across multiple

sources and environments

– Lack of capability to quickly review past historical

data limits value of collected test records

• Adaptability and Scalability

– Difficult to scale T&E system as data grows

Unstructured, Structured,

Image/Video & Voice Data

• Analysis tools

– Data is often manually processed with slow

response time from Data to Decision

– Manual processing introduces human error

– T&E systems lack automation tools to

analyze large volumes of unstructured text,

structured data, image/video data, and voice

data

– T&E systems lack ability to perform deep

analysis on test event as it occurs

Slow response time from Data to Decision

Test & Evaluation Need

4Distribution Statement A. Approved for public release; distribution is unlimited.

Page 5: PowerPoint Presentation Symposium/2016...Title PowerPoint Presentation Author Andrew P. Shaffer Created Date 10/9/2016 12:54:20 PM

Tools & Background

Private big data processing cloud architecture is

optimized for processing large volumes of data

ARL Test

Cloud System

• RDAN leverages open-source software packages

- Hadoop/Accumulo software stack is freely available and

continually being updated by the community

• RDAN is optimized to securely process big data

- Software framework designed for scalability & fault tolerance

- Storage of data on compute nodes eliminates I/O bottlenecks

- Large clusters with inexpensive commodity components

support massive aggregate I/O, CPU, and network capacity

- System hardware and network can be tuned for workload

- Graphics Processing Units (GPUs) can be added to support

compute-intensive workloads

- Private cloud architecture secures data storage and processing

5Distribution Statement A. Approved for public release; distribution is unlimited.

Page 6: PowerPoint Presentation Symposium/2016...Title PowerPoint Presentation Author Andrew P. Shaffer Created Date 10/9/2016 12:54:20 PM

RDAN System Architecture

Multilevel Index

Wildcard Dictionary

Document Token Count

Auxiliary IndexesData

(GFI)

Analysis &

Fusion

Semantics Library

Index

Files

Data

Store

Ind

exes

Query Rule MgtFilter MgtFilter

Library

Indexing EngineData Conversion Engine

Query & Analysis EngineUser Interface EngineTester / Analyst

Convert &

Tag Src

Automated

Annotation

Filter &

Extract

10 gigE

Basic Query /

Management Command

Results

Query

Results

High-Level Query

Results

A

P

I

Unstruct

Text

Index

Generator

Graphical User

Interface

(Web-Based)

Data

Ingest

Custom

Analysis

Custom User

Interface Custom

Analysis

Data

Self-

Describing

Canonical-

Format Data

with

Extracted

Entities &

Metadata

A

P

I

Legend

RDAN Core

System

Use-Case

Specific

RDAN’s purpose is to develop new technologies to rapidly

process high volume unstructured and structured dataAPI: Application Programming

Interface

6Distribution Statement A. Approved for public release; distribution is unlimited.

Page 7: PowerPoint Presentation Symposium/2016...Title PowerPoint Presentation Author Andrew P. Shaffer Created Date 10/9/2016 12:54:20 PM

RDAN Functional Operators

7Distribution Statement A. Approved for public release; distribution is unlimited.

RDAN supports a diverse set of query and analysis tools that

can be combined to support automated analysis

Indexes & Data

Index-Level Iterators

Logical Iterators

Utility Iterators

Client-Side

Analysis

OR

User

Interface

- Multilevel index

- Wildcard dictionary

- Auxiliary index structures

- Data blocks

- N-Gram iterator

- Term iterator

- Logical (AND/AND-N/OR/NOT)

iterators (can be composed to

form trees of arbitrary depth and

complexity)

- Field selection iterator

- HDFS/Accumulo file writing iterator

- Node-level sorting iterator

- Node-level aggregation operators

- Top-k query optimizer

- Relevance ranking normalizer

- Generate result snapshots

- Global sorting iterator

- Global aggregation operators

- Clustering & outlier detection

- Source association

- Semantics rule evaluation

- Enter & manage queries and

semantics rules

- Display results (dashboard,

timeline, record list, etc…)

- Query preprocessing (e.g. wildcard

query expansion)

- Accumulo API

Single Node Processing

(Client Computer)

All Nodes Processing

(RDAN Cluster)

AND AND

Page 8: PowerPoint Presentation Symposium/2016...Title PowerPoint Presentation Author Andrew P. Shaffer Created Date 10/9/2016 12:54:20 PM

Key Science & Technology

Innovations

8Distribution Statement A. Approved for public release; distribution is unlimited.

Diverse data

formats collected

using a single

system

New automated

analysis and data

fusion

Data to Decision

is rapid

(seconds)

• Novel Data Ingestion Architecture

– Custom parallel algorithms provide scalability, high

throughput, and low latency for storage, indexing,

and analysis using the latest cloud technologies

• Pipelined Indexing Architecture

– New data structures and algorithms significantly

improve indexing throughput while maintaining low

query latency

• Extensible Canonical Data Format

– Self describing data format allows new sensors

and analysis modules to be added to system

without modifying system architecture

• Flexible Search and Analysis Tools

– New semantics rules allow analysts to search and

analyze data using high-level constructs

Rapid response time from Data to Decision

Decision

RDAN allows testers and analysts to

perform near real-time analysis of complex

distributed T&E events

High volume data

Unstructured, Structured,

Image/Video & Voice Data

Page 9: PowerPoint Presentation Symposium/2016...Title PowerPoint Presentation Author Andrew P. Shaffer Created Date 10/9/2016 12:54:20 PM

Innovations in the RDAN

Architecture

Sensor Data, Metadata, and

IndexesSupport diverse sensors at high data rates

• Self-describing canonical data format

• High-throughput/low-latency indexes

Analysis Tools

RDAN Processing

Web-based GUI

RDAN Architecture

Unstructured, Structured,

Image/Video & Voice Data

Tester / AnalystDistributed Test Data Reduce time from data to decision

• Reduce decision from hrs/wks to secs/mins

Review large current & historical data sets

during test events

• Flexible GUI supports iterative processing &

drill-down analysis

Automate analysis & support semantics

rules to reduce human error

• Diverse query and analysis operators

• Framework for text analytics

RDAN

Utilize h/w efficiently

• Low cost COTS h/w

• Requires less h/w than

Accumulo-only systems

Data Conversion

Support near real-time

ingestion & indexing

Convert diverse data sources to

self-describing canonical format

• Multi-field automated annotation

• High-throughput filtering & indexing

Implement scalable high performance

image and video processing algorithms

RDAN enables rapid data to decision analysis at low cost9Distribution Statement A. Approved for public release; distribution is unlimited.

Page 10: PowerPoint Presentation Symposium/2016...Title PowerPoint Presentation Author Andrew P. Shaffer Created Date 10/9/2016 12:54:20 PM

Potential Use Cases &

Applications

PACE

R

RDAN

IA/Cyber T&E Exercises

Distributed

T&E EventsImage & Video

Rapidly analyze large cyber

audit logs to prioritize

vulnerabilities for resolution

Provide near real-time feedback

about test events to improve

test range utilization

Rapidly collect, filter, store, and

analyze diverse high volume

data collected during T&E

events

Automate analysis of large

volumes of image and video

data collected during T&E

events

RDAN can support

multiple T&E needs

10Distribution Statement A. Approved for public release; distribution is unlimited.

Page 11: PowerPoint Presentation Symposium/2016...Title PowerPoint Presentation Author Andrew P. Shaffer Created Date 10/9/2016 12:54:20 PM

Summary & Future Work

• RDAN is a prototype end-to-end system that builds on secure private cloud technologies to automatically analyze large volumes of unstructured, structured, image/video and voice data

– Reduces the time from Data to Decision from hours/days/weeks to seconds

– Offers interactive and automated analysis of both live and recorded data

– Scalable and highly configurable to support multiple Test and Evaluation programs

• RDAN mitigates risk by providing near real-time analysis of data of different types, structures, and sizes

– Near real-time analysis of test event saves time and money

– Automated processing of data minimizes human errors

– Supports review of live data and collected historical data during test

• Proposed future developments− Further mature RDAN system technologies

− Upgrade RDAN image and video processing algorithms

− Increase system performance using GPUs

− Identify additional use cases to utilize RDAN technologies

Currently seeking transition sponsors for future funding to

support further maturation of RDAN

11Distribution Statement A. Approved for public release; distribution is unlimited.

Page 12: PowerPoint Presentation Symposium/2016...Title PowerPoint Presentation Author Andrew P. Shaffer Created Date 10/9/2016 12:54:20 PM

Points of Contact

Mr. Bruce Einfalt

Principal Investigator

814-863-4142

[email protected]

Mr. Andrew Shaffer

Technical Lead

814-863-0312

[email protected]

Mr. Manuel Gonzalez-Rivero

Image Processing Lead

814-865-9583

[email protected]

12Distribution Statement A. Approved for public release; distribution is unlimited.