Upload
naver-d2
View
7.571
Download
0
Embed Size (px)
Citation preview
Beyond Messaging Enterprise Dataflow powered by Apache NiFi
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Aldrin Piri
DEVIEW 2015 2015.09.15
Page 2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
About me
Member of Technical Staff
Project Management Committee and Committer
@aldrinpiri
Page 3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Simplistic View of Enterprise Data Flow
The Data Flow Thing
Process and
Analyze Data Acquire Data
Store Data
Page 4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
4
• Remote sensor delivery (Internet of Things - IoT)
• Intra-site / Inter-site / global distribution (Enterprise)
• Ingest for driving analytics (Big Data)
• Data Processing (Simple Event Processing)
Where do we find data flow?
Page 5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Basics of Connecting Systems
For every connection,
these must agree:
1. Protocol
2. Format
3. Schema
4. Priority
5. Size of event
6. Frequency of event
7. Authorization access
8. Relevance
P1
Producer
C1
Consumer
Page 6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
6
• Messaging addresses only a small subset of the problem space
• Needed to understand the big picture
• Needed the ability to make immediate changes
• Must maintain chain of custody for data
• Rigorous security and compliance requirements
Challenges of dataflow in the enterprise
Page 7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
7
Great options including:
• Kafka
• ActiveMQ
• Tibco
Let us consider the perfect messaging system for this talk:
• It has zero latency
• It has perfect data durability
• It supports unlimited consumers and producers
Messaging Systems as Dataflow
Page 8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
8
“But my system needs…”
• A different format and/or schema
• To use a different protocol
• The highest priority information first
• Large objects (event batches) / Small Objects (streams)
• Authorization to the data level
• Only interested in a subset of data on a topic
• Data needs to be enriched/sanitized before it arrives
Dataflow as a messaging problem
Page 9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Using Messaging
Only a subset agree
using messaging
1. Protocol
2. Format
3. Schema
4. Priority
5. Size of event
6. Frequency of event
7. Authorization access
8. Relevance
P1
CN
C1
Messaging
More issues to consider:
• How do you know what the data flow looks like?
• How is it managed?
• How is it working – today, yesterday?
Page 10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
10
• Add new systems to handle the protocol differences
• Add new systems to convert the data
• Add new systems to reorder the data
• Add new systems to filter the unauthorized data
• Add new topics to represent ‘stages of the flow’
Which leads to latency, complexity, and limited retention
Ultimately, the operations teams who handle data at flow boundaries become
responsible for managing.
How these issues are typically solved
Page 11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Real-time Data Flow
It’s not just how quickly you move data – it’s about how quickly you can change behavior and seize new opportunities
Page 12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Introducing Apache NiFi
• Guaranteed delivery
• Data buffering
- Backpressure
- Pressure release
• Prioritized queuing
• Flow specific QoS
- Latency vs. throughput
- Loss tolerance
• Data provenance
• Recovery/recording
a rolling log of fine-
grained history
• Visual command and
control
• Flow templates
• Pluggable/multi-role
security
• Designed for extension
• Clustering
Page 13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
13
November 2014
NiFi is donated to the Apache Software Foundation
(ASF) through NSA’s Technology Transfer Program
and enters ASF’s incubator.
2006
NiagaraFiles (NiFi) was first incepted by Joe Witt at
the National Security Agency (NSA)
A Brief History
July 2015
NiFi reaches ASF top-level project status
Page 14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Flow Based Programming (FBP)
FBP Term NiFi Term Description
Information
Packet
FlowFile Each object moving through the system.
Black Box FlowFile
Processor
Performs the work, doing some combination of data routing,
transformation, or mediation between systems.
Bounded
Buffer
Connection The linkage between processors, acting as queues and allowing various
processes to interact at differing rates.
Scheduler Flow
Controller
Maintains the knowledge of how processes are connected, and manages
the threads and allocations thereof which all processes use.
Subnet Process
Group
A set of processes and their connections, which can receive and send
data via ports. A process group allows creation of entirely new
component simply by composition of its components.
Page 15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
OS/Host
JVM
Flow Controller
Web Server
Processor 1 Extension N
FlowFile
Repository
Content
Repository
Provenance
Repository
Local Storage
OS/Host
JVM
Flow Controller
Web Server
Processor 1 Extension N
FlowFile
Repository
Content
Repository
Provenance
Repository
Local Storage
Architecture OS/Host
JVM
NiFi Cluster Manager – Request Replicator
Web Server
Master
NiFi Cluster Manager (NCM)
OS/Host
JVM
Flow Controller
Web Server
Processor 1 Extension N
FlowFile
Repository
Content
Repository
Provenance
Repository
Local Storage
Slaves
NiFi Nodes
Page 16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Live Demonstration
Page 17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Feature Proposals – Status
FUTURE Better integration with Apache Kafka
FUTURE Clustering redesign
IN PROGRESS Configuration management of flows
STARTED Extension and template registry
RELEASE COMING SOON First-class Avro support 1
STARTED Interactive queue management
STARTED Multi-tenant data flow
FUTURE Pluggable authentication
FUTURE Reference-able process groups
FUTURE Variable registry
FUTURE ‘Wormhole’ connections
https://cwiki.apache.org/confluence/display/NIFI/NiFi+Feature+Proposals
Page 18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Learn more and join us!
Apache NiFi site
http://nifi.apache.org
Subscribe to and collaborate at
Submit Ideas or Issues
https://issues.apache.org/jira/browse/NIFI
Follow us on Twitter
@apachenifi
Page 19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Thank you!