22
Apache NiFi Better Analytics Demand Better Dataflow Presented by: Joe Witt Apache NiFi PPMC Member

Joe witt may2015_kafka_nyc_apachenifi-overview

Embed Size (px)

Citation preview

Apache NiFi Better Analytics Demand Better Dataflow

Presented by: Joe Witt Apache NiFi PPMC Member

2  

#nifidemo

History

3  

•  Developed at NSA for over eight years

•  Donated to the Apache Software Foundation Nov 2014

•  Undergoing incubation

•  Three ASF releases to date •  0.1.0 out last night!

 

The problem space: Enterprise Dataflow

4  

Automate the flow of data from any source

…to systems which extract meaning and insight

…and to those that store and make it available for users

The challenges we faced

5  

•  Transport / Messaging was not enough

•  Needed to understand the big picture

•  Needed the ability to make *immediate* changes

•  Must maintain chain of custody for data •  Rigorous security and compliance requirements  

Why transport and messaging was not enough?

6  

•  Data access exceeded resources to transport

•  Decoupling systems is about more than the connectivity

•  Message sizes ranged from B to GB

•  Not all data is created equal

•  Needed precise security controls •  SSL and topic level authorization insufficient

 

 The basic building blocks

Real-time Command and Control

The Power of Provenance

7  

Apache NiFi Foundational Concepts

2

3

1

HEADER  -­‐  UUID  -­‐  Name  -­‐  Size  -­‐  Entry  Time  

           A3ributes  Map                [[Key  |  Value]]  

CONTENT  

Flow File

8  

•  Types •  Events •  Objects •  Files •  Messages •  Media

•  Formats •  JSON •  Avro •  Text •  Mp4 •  Proprietary

•  Sizes •  Bytes to GBs

Flow File Processor

9  

• Routing •  Context •  Content

• Transformation •  Enrich •  Obfuscate •  Filter •  Convert •  Analyze •  Split •  Aggregate

• Mediation •  Push / Pull • …

Connections

10  

• Queuing • Back Pressure • Expiration

• Prioritize

• Swapping

Flow Controller

11  

NiFi Architecture

12  

NiFi Clustering Model

13  

Tighten the feedback loop •  Changes have consequences (good or bad) •  And you see them as they occur

Continuous Improvement •  Compare real-time vs. historical statistics •  View data provenance •  View Content at any stage Intuitive user experience •  Visual programming •  Logical flow graph

14  

Real-time command and control 2

Latency Optimization •  Intra process •  Inter process •  End-to-end Compliance •  Prove handling •  Assess impact Understanding •  Step through time •  View content •  View Context

15  

The Power of Provenance – Chain of custody for data 3

16  

Demo

Flow File Repo – Write Ahead Log Content Repo

Add more partitions Input/Output Streams

Copy on Write Pass by Reference Allow tradeoffs of latency vs throughput

17  

How fast is it and why?

- User to System and System to System -  Authentication (2-Way SSL)

-  Authorization (pluggable)

-  Authorize a specific piece of data to a specific system

-  Data provenance -  Prove you have done the right thing -  Recover when you have not

18  

How does it deal with security?

Web UI Push API

Reporting Tasks (ganglia, graphite, etc…) Pull API

REST API

19  

How can I monitor this at runtime?

Flow File Processors Advanced UI

Flow File Prioritizer Reporting Tasks Controller Services Build Clients against our REST API

20  

What are the points of extension?

Status and direction for NiFi

21  

Efficient use of each node -  100s of MB/s per node -  100Ks transactions/s per node Simple / Effective scaling model Runtime Command and Control Data Provenance  

Distributed durability of data - Maybe Kafka backed queues High Availability Cluster Manager Live / Rolling Upgrades Provenance Query Language / Reporting A complete user experience enabled by provenance

Existing Strengths Roadmap Highlights

Apache NiFi (incubating) site http://nifi.incubator.apache.org Subscribe to and collaborate at [email protected] [email protected] Submit Ideas or Issues https://issues.apache.org/jira/browse/NIFI @apachenifi  

22  

Learn more about Apache NiFi