Upload
lequynh
View
212
Download
0
Embed Size (px)
Citation preview
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
March 23-24, 2017
Doing It Right
SYMPOSIUM
Managing Data in Motion with
the Connected Data Architecture
Dmitry BaevDirector, Solutions Engineering
4th Big Data & Business Analytics Symposium – March 23-24, 2017
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Disclaimer
This document may contain product features and technology directions that are under development, may be under development in the future or may ultimately not be developed.
Project capabilities are based on information that is publicly available within the Apache Software Foundation project websites ("Apache"). Progress of the project capabilities can be tracked from inception to release through Apache, however, technical feasibility, market demand, user feedback and the overarching Apache Software Foundation community development process can all effect timing and final delivery.
This document’s description of these features and technology directions does not represent a contractual commitment, promise or obligation from Hortonworks to deliver these features in any generally available product.
Product features and technology directions are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.
Since this document contains an outline of general product development plans, customers should not rely upon it when making purchasing decisions.
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
INTERNETOF
ANYTHING
AGE OF DATAOpen source is the norm,
and Apache is the center of gravityFounded: 1999
4 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Systems ofInsight
PersonalizedExperiences
BalancedSupply Chains
New Products& Services
OperationalEfficiencies
Connected Data Defines the New Way of Business
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Constrained High-latency Localized context
Hybrid – cloud / on-premises Low-latency Global context
CoreInfrastructure
Harnessing Data in Motion
RegionalInfrastructureSources
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hortonworks DataFlow: Data in Motion Platform
Platform to build dataflow management and streaming analytics solutions that
collect, curate, analyze and act on data in motion across the data center and cloud.
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Easy, Secure, Reliable Way to Get the DataYou Need
From the edge to anywhere with
intelligence
Immediate andContinuous Insights
Because acting on perishable insights in real timemaximizes value
Provisioning, Management, Monitoring, Security, Audit,
Compliance, GovernanceBecause it all has to work together in an enterprise environment
FLOW MANAGEMENT
STREAMPROCESSING
ENTERPRISESERVICES
Hortonworks DataFlow for Data in MotionPowered by Apache NiFi, Kafka, and Storm
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hortonworks DataFlow Manages Data in Motion
CoreInfrastructureSources
ConstrainedHigh-latencyLocalized context
Hybrid – cloud / on-premisesLow-latencyGlobal context
RegionalInfrastructure
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Simplistic View of Dataflows: Easy, Definitive
AcquireData
StoreData
DataFlow
ProcessAnalyze
Data
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Realistic View of Dataflows: Complex, Convoluted
AcquireData
StoreData
AcquireData
StoreData
StoreData
StoreData
StoreData
Processand
AnalyzeData
DataFlow
AcquireData
AcquireData
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Requirements for Data in Motion
Perishable Insights
ConnectivityPrioritization
Security
AdaptabilityExtensibility
Scalability
Provenance
Real-Time
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
• For agile and immediate creation, configuration, control of dataflowsVisual Command and Control
• Ensures trust of your dataData Lineage (Provenance)
• Because not all data is of equal importanceData Prioritization
• Since not all senders/receivers/connections work perfectly all the timeData Buffering/Back-Pressure
• Adapt to different situations with different requirementsControl Latency vs Throughput
• Security of data, and data accessSecure Control Plane/Data Plane
• ScalabilityScale out Clustering
• Ecosystem flexibility and growthExtensibility
Apache NiFi: Designed for 8 challenges of global enterprise dataflow
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hortonworks DataFlow, powered by Apache NiFi
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Connecting Data Between Ecosystems Without Coding: 180+ Processors
Hash
Extract
Merge
Duplicate
Scan
GeoEnrich
Replace
ConvertSplit
Translate
Route Content
Route Context
Route Text
Control Rate
Distribute Load
Generate Table Fetch
Jolt Transform JSON
Prioritized Delivery
Encrypt
Tail
Evaluate
Execute
HL7
FTP
UDP
XML
SFTP
HTTP
Syslog
HTML
Image
AMQP
MQTT
All Apache project logos are trademarks of the ASF and the respective projects.
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Edge Intelligence with Apache MiNiFi
Guaranteed delivery Data buffering
‒ Backpressure‒ Pressure release
Prioritized queuing Flow specific QoS
‒ Latency vs. throughput‒ Loss tolerance
Data provenance
Recovery / recording a rolling log of fine-grained history
Designed for extension
Different from Apache NiFi Design and Deploy Warm re-deploys
Key Features
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hortonworks DataFlow Manages Data in Motion
Constrained High-latency Localized context
Hybrid – cloud/on-premises Low-latency Global context
EDGE/ SOURCESREGIONAL
INFRASTRUCTURECORE
INFRASTRUCTURE
• Data Acquisition & Movement• Intelligent Routing & Filtering• Parse, Enrich & Data Delivery
• Detect Complex Patterns in Real-Time• Descriptive, Prescriptive & Predictive Analytics• Join/Split Streams
Streaming Analytics Manager
• Data Acquisition• Edge Intelligence• Bi-directional Comm.
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Stream Processing using Streaming Analytics Manager
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Data in Motion Needs DataFlow Management and Stream Processing
Data in Motion Using DataFlow Management• Data Ingestion • Edge Intelligence• First Mile Problem • Physical Data Movement • Simple event processing such as Route, Filter, Enrich,
Transform, etc.
DataFlowManagement
Data in Motion Using DataFlow Management and Stream Processing• Flow Management to deliver data for Stream Processing• Stream Processing for complex pattern matching on
unbounded streams of data.
DataFlowManagement and Stream Processing
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved© Hortonworks Inc. 2011
Hortonworks DataFlow: Data-in-Motion Platform
Data AcquisitionEdge Processing
Page 19
Real Time Stream AnalyticsRapid Application Development
Enterprise Services
Ambari Ranger Other services
Flow management Flow management + Stream Processing
Kafka
Storm
Others…
NiFi
Streaming Analytics Manager
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Building a Real-Time Insights ApplicationExample: Company X provides alerting services when users’ resting heart rate higher than a threshold
Acquire Data
Company X Cloud Instance 1
Acquire Data
Company X Cloud Instance 2
Acquire Data
Company X Cloud Instance 3
Acquire Data Across Cloud
Instances
Parse, Filter, Validate, Enrich
and Route
Core Data Center
Analytics/Pattern Match
Data Store
Alerts
Dashboards/Visualization
Flow Management Stream ProcessingLegend:
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Data in Motion Needs Dataflow Management and Stream Processing
Acquire data from various Wearable Device’s Cloud Instances
Move Data from Customer Cloud Instances to on-premise instance
Intelligent Routing & Filtering of data. The routing and filtering rules will be often changed at run-time.
Deliver the data to various downstream systems. New downstream apps will be added over time
Parse the device data to standardized format that downstream system can understand
Enrich the data with contextual information including patient/customer info (age, gender, etc..)
Recognize the Pattern when the resting heart rate exceeds a certain threshold (the insight), and then create an alert/notification.
Run a Outlier detection model on streaming heart rate that comes in. If the score is above a certain threshold, alert on the heart rate.
Flow Management
StreamProcessing
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thank You
23 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Capturestreaming data
Deliverperishable insights
Combinenew & old data
Storedata forever
Accessa multi-tenant data lake
Modelwith artificial intelligence
DATA AT REST(Hor tonwor ks Data P l at for m)
DATA IN MOTION(Hor tonwor ks DataF l ow)
ACTIONABLEINTELLIGENCE
Perishable Insights Historical Insights
Managing Data in Motion with the Connected Data Architecture
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hortonworks DataFlow: Data in Motion PlatformHDF provides the flow management, stream processing, and enterprise services needed
to collect, curate, analyze and act on data-in-motion across the data center and cloud.
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Real-Time, Visual Control of Data Flows between Datacenter and the Cloud
Add and Adjust Data Sourcesto maximize the opportunity that you capture from perishable insights
Visually Trace the Data Pathto manage the what, who, where and how around data in motion
Dynamically Adjust the Pipelineto match the dataflow withyour bandwidth
H O R T O N W O R K S D ATA F L O W Add and adjust
data sources
Visually tracethe data path
Dynamically adjust the pipeline
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Integrated Processes and Control for Data Flows and Streaming Analytics
C O M M O N A R C H I T E C T U R EW I T H O U T H O R T O N W O R K S D A T A F L O W
W I T H H O R T O N W O R K S D A T A F L O W
Ingest
Scripts
Messaging
Scripts
HORTONWORKS DATAFLOW
Optimize the ArchitectureReduce cost an complexity with the most efficient data collection technologies
Assure Efficient OperationsVia real-time control of data inputs, outputs, transportation and transformations
Rely on a Common FoundationEliminating dependence on multiple customized systems
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Secure Flows with Chain of Custody and Provenance
H O R T O N W O R K S D ATA F L O W
End-to-End SecurityApply security rules to encrypt,decrypt, filter and replace data fromthe point of collection at the jaggededge to its final destination
Granular Control and SharingMove beyond role-based access and dynamically share an entire dataflow
Real-Time TraceabilityRich metadata and contextual detailhelps troubleshoot security issues and informs timely decisions
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Streaming Data Flows with Adaptive Control
AutomatedBi-directional communication between source and destination adapts data flows automatically, according to current priorities
On-Demand Operational control to adapt to changing conditions and requirements
ScalableBy incorporating data from any device—small machine sensors to enterprise data centers—HDF connects you to the broadest set of disparate data sources
29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Data Acquisition & Delivery: First Mile Problem - Architecture
Server Cluster
DVR NiFi NiFi NiFi
Home Security
Client Libraries
Client Libraries
MiNiFi
MiNiFi
Regional Data CenterConsumer Devices
Server Cluster
NiFi NiFi NiFi
Core Data Center
30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
T i e r 1 : R e g i o n a l D a t a C e n t e r( ~ 1 0 0 0 )
T i e r 2 : D i s t r i b u t i o n D a t a C e n t e r( ~ 1 0 0 )
T i e r 3 : N a t i o n a l D a t a C e n t e r( 3 - 6 )
Gateway
Server Cluster
DVR Kafka
Server Cluster
Splunk
Kafka
Storm/Spark/Flink/Apex
NiFi NiFi NiFi
MiNiFi
NiFi NiFi NiFi
Home Security
Client Libraries
Client Libraries
Router
Switch
Gateway Networking device
Storage Storage
MiNiFi
End to End Data in Motion Architecture
Streaming Analytics Manager