30
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved March 23-24, 2017 Doing It Right SYMPOSIUM Managing Data in Motion with the Connected Data Architecture Dmitry Baev Director, Solutions Engineering 4 th Big Data & Business Analytics Symposium – March 23-24, 2017

Managing Data in Motion with the Connected Data … © Hortonworks Inc. 2011 – 2016. All Rights Reserved March 23-24, 2017 Doing It Right SYMPOSIUM Managing Data in Motion with the

  • Upload
    lequynh

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Managing Data in Motion with the Connected Data … © Hortonworks Inc. 2011 – 2016. All Rights Reserved March 23-24, 2017 Doing It Right SYMPOSIUM Managing Data in Motion with the

1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

March 23-24, 2017

Doing It Right

SYMPOSIUM

Managing Data in Motion with

the Connected Data Architecture

Dmitry BaevDirector, Solutions Engineering

4th Big Data & Business Analytics Symposium – March 23-24, 2017

Page 2: Managing Data in Motion with the Connected Data … © Hortonworks Inc. 2011 – 2016. All Rights Reserved March 23-24, 2017 Doing It Right SYMPOSIUM Managing Data in Motion with the

2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Disclaimer

This document may contain product features and technology directions that are under development, may be under development in the future or may ultimately not be developed.

Project capabilities are based on information that is publicly available within the Apache Software Foundation project websites ("Apache"). Progress of the project capabilities can be tracked from inception to release through Apache, however, technical feasibility, market demand, user feedback and the overarching Apache Software Foundation community development process can all effect timing and final delivery.

This document’s description of these features and technology directions does not represent a contractual commitment, promise or obligation from Hortonworks to deliver these features in any generally available product.

Product features and technology directions are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.

Since this document contains an outline of general product development plans, customers should not rely upon it when making purchasing decisions.

Page 3: Managing Data in Motion with the Connected Data … © Hortonworks Inc. 2011 – 2016. All Rights Reserved March 23-24, 2017 Doing It Right SYMPOSIUM Managing Data in Motion with the

3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

INTERNETOF

ANYTHING

AGE OF DATAOpen source is the norm,

and Apache is the center of gravityFounded: 1999

Page 4: Managing Data in Motion with the Connected Data … © Hortonworks Inc. 2011 – 2016. All Rights Reserved March 23-24, 2017 Doing It Right SYMPOSIUM Managing Data in Motion with the

4 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Systems ofInsight

PersonalizedExperiences

BalancedSupply Chains

New Products& Services

OperationalEfficiencies

Connected Data Defines the New Way of Business

Page 5: Managing Data in Motion with the Connected Data … © Hortonworks Inc. 2011 – 2016. All Rights Reserved March 23-24, 2017 Doing It Right SYMPOSIUM Managing Data in Motion with the

5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Constrained High-latency Localized context

Hybrid – cloud / on-premises Low-latency Global context

CoreInfrastructure

Harnessing Data in Motion

RegionalInfrastructureSources

Page 6: Managing Data in Motion with the Connected Data … © Hortonworks Inc. 2011 – 2016. All Rights Reserved March 23-24, 2017 Doing It Right SYMPOSIUM Managing Data in Motion with the

6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Hortonworks DataFlow: Data in Motion Platform

Platform to build dataflow management and streaming analytics solutions that

collect, curate, analyze and act on data in motion across the data center and cloud.

Page 7: Managing Data in Motion with the Connected Data … © Hortonworks Inc. 2011 – 2016. All Rights Reserved March 23-24, 2017 Doing It Right SYMPOSIUM Managing Data in Motion with the

7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Easy, Secure, Reliable Way to Get the DataYou Need

From the edge to anywhere with

intelligence

Immediate andContinuous Insights

Because acting on perishable insights in real timemaximizes value

Provisioning, Management, Monitoring, Security, Audit,

Compliance, GovernanceBecause it all has to work together in an enterprise environment

FLOW MANAGEMENT

STREAMPROCESSING

ENTERPRISESERVICES

Hortonworks DataFlow for Data in MotionPowered by Apache NiFi, Kafka, and Storm

Page 8: Managing Data in Motion with the Connected Data … © Hortonworks Inc. 2011 – 2016. All Rights Reserved March 23-24, 2017 Doing It Right SYMPOSIUM Managing Data in Motion with the

8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Hortonworks DataFlow Manages Data in Motion

CoreInfrastructureSources

ConstrainedHigh-latencyLocalized context

Hybrid – cloud / on-premisesLow-latencyGlobal context

RegionalInfrastructure

Page 9: Managing Data in Motion with the Connected Data … © Hortonworks Inc. 2011 – 2016. All Rights Reserved March 23-24, 2017 Doing It Right SYMPOSIUM Managing Data in Motion with the

9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Simplistic View of Dataflows: Easy, Definitive

AcquireData

StoreData

DataFlow

ProcessAnalyze

Data

Page 10: Managing Data in Motion with the Connected Data … © Hortonworks Inc. 2011 – 2016. All Rights Reserved March 23-24, 2017 Doing It Right SYMPOSIUM Managing Data in Motion with the

10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Realistic View of Dataflows: Complex, Convoluted

AcquireData

StoreData

AcquireData

StoreData

StoreData

StoreData

StoreData

Processand

AnalyzeData

DataFlow

AcquireData

AcquireData

Page 11: Managing Data in Motion with the Connected Data … © Hortonworks Inc. 2011 – 2016. All Rights Reserved March 23-24, 2017 Doing It Right SYMPOSIUM Managing Data in Motion with the

11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Requirements for Data in Motion

Perishable Insights

ConnectivityPrioritization

Security

AdaptabilityExtensibility

Scalability

Provenance

Real-Time

Page 12: Managing Data in Motion with the Connected Data … © Hortonworks Inc. 2011 – 2016. All Rights Reserved March 23-24, 2017 Doing It Right SYMPOSIUM Managing Data in Motion with the

12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

• For agile and immediate creation, configuration, control of dataflowsVisual Command and Control

• Ensures trust of your dataData Lineage (Provenance)

• Because not all data is of equal importanceData Prioritization

• Since not all senders/receivers/connections work perfectly all the timeData Buffering/Back-Pressure

• Adapt to different situations with different requirementsControl Latency vs Throughput

• Security of data, and data accessSecure Control Plane/Data Plane

• ScalabilityScale out Clustering

• Ecosystem flexibility and growthExtensibility

Apache NiFi: Designed for 8 challenges of global enterprise dataflow

Page 13: Managing Data in Motion with the Connected Data … © Hortonworks Inc. 2011 – 2016. All Rights Reserved March 23-24, 2017 Doing It Right SYMPOSIUM Managing Data in Motion with the

13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Hortonworks DataFlow, powered by Apache NiFi

Page 14: Managing Data in Motion with the Connected Data … © Hortonworks Inc. 2011 – 2016. All Rights Reserved March 23-24, 2017 Doing It Right SYMPOSIUM Managing Data in Motion with the

14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Connecting Data Between Ecosystems Without Coding: 180+ Processors

Hash

Extract

Merge

Duplicate

Scan

GeoEnrich

Replace

ConvertSplit

Translate

Route Content

Route Context

Route Text

Control Rate

Distribute Load

Generate Table Fetch

Jolt Transform JSON

Prioritized Delivery

Encrypt

Tail

Evaluate

Execute

HL7

FTP

UDP

XML

SFTP

HTTP

Syslog

Email

HTML

Image

AMQP

MQTT

All Apache project logos are trademarks of the ASF and the respective projects.

Page 15: Managing Data in Motion with the Connected Data … © Hortonworks Inc. 2011 – 2016. All Rights Reserved March 23-24, 2017 Doing It Right SYMPOSIUM Managing Data in Motion with the

15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Edge Intelligence with Apache MiNiFi

Guaranteed delivery Data buffering

‒ Backpressure‒ Pressure release

Prioritized queuing Flow specific QoS

‒ Latency vs. throughput‒ Loss tolerance

Data provenance

Recovery / recording a rolling log of fine-grained history

Designed for extension

Different from Apache NiFi Design and Deploy Warm re-deploys

Key Features

Page 16: Managing Data in Motion with the Connected Data … © Hortonworks Inc. 2011 – 2016. All Rights Reserved March 23-24, 2017 Doing It Right SYMPOSIUM Managing Data in Motion with the

16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Hortonworks DataFlow Manages Data in Motion

Constrained High-latency Localized context

Hybrid – cloud/on-premises Low-latency Global context

EDGE/ SOURCESREGIONAL

INFRASTRUCTURECORE

INFRASTRUCTURE

• Data Acquisition & Movement• Intelligent Routing & Filtering• Parse, Enrich & Data Delivery

• Detect Complex Patterns in Real-Time• Descriptive, Prescriptive & Predictive Analytics• Join/Split Streams

Streaming Analytics Manager

• Data Acquisition• Edge Intelligence• Bi-directional Comm.

Page 17: Managing Data in Motion with the Connected Data … © Hortonworks Inc. 2011 – 2016. All Rights Reserved March 23-24, 2017 Doing It Right SYMPOSIUM Managing Data in Motion with the

17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Stream Processing using Streaming Analytics Manager

Page 18: Managing Data in Motion with the Connected Data … © Hortonworks Inc. 2011 – 2016. All Rights Reserved March 23-24, 2017 Doing It Right SYMPOSIUM Managing Data in Motion with the

18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Data in Motion Needs DataFlow Management and Stream Processing

Data in Motion Using DataFlow Management• Data Ingestion • Edge Intelligence• First Mile Problem • Physical Data Movement • Simple event processing such as Route, Filter, Enrich,

Transform, etc.

DataFlowManagement

Data in Motion Using DataFlow Management and Stream Processing• Flow Management to deliver data for Stream Processing• Stream Processing for complex pattern matching on

unbounded streams of data.

DataFlowManagement and Stream Processing

Page 19: Managing Data in Motion with the Connected Data … © Hortonworks Inc. 2011 – 2016. All Rights Reserved March 23-24, 2017 Doing It Right SYMPOSIUM Managing Data in Motion with the

19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved© Hortonworks Inc. 2011

Hortonworks DataFlow: Data-in-Motion Platform

Data AcquisitionEdge Processing

Page 19

Real Time Stream AnalyticsRapid Application Development

Enterprise Services

Ambari Ranger Other services

Flow management Flow management + Stream Processing

Kafka

Storm

Others…

NiFi

Streaming Analytics Manager

Page 20: Managing Data in Motion with the Connected Data … © Hortonworks Inc. 2011 – 2016. All Rights Reserved March 23-24, 2017 Doing It Right SYMPOSIUM Managing Data in Motion with the

20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Building a Real-Time Insights ApplicationExample: Company X provides alerting services when users’ resting heart rate higher than a threshold

Acquire Data

Company X Cloud Instance 1

Acquire Data

Company X Cloud Instance 2

Acquire Data

Company X Cloud Instance 3

Acquire Data Across Cloud

Instances

Parse, Filter, Validate, Enrich

and Route

Core Data Center

Analytics/Pattern Match

Data Store

Alerts

Dashboards/Visualization

Flow Management Stream ProcessingLegend:

Page 21: Managing Data in Motion with the Connected Data … © Hortonworks Inc. 2011 – 2016. All Rights Reserved March 23-24, 2017 Doing It Right SYMPOSIUM Managing Data in Motion with the

21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Data in Motion Needs Dataflow Management and Stream Processing

Acquire data from various Wearable Device’s Cloud Instances

Move Data from Customer Cloud Instances to on-premise instance

Intelligent Routing & Filtering of data. The routing and filtering rules will be often changed at run-time.

Deliver the data to various downstream systems. New downstream apps will be added over time

Parse the device data to standardized format that downstream system can understand

Enrich the data with contextual information including patient/customer info (age, gender, etc..)

Recognize the Pattern when the resting heart rate exceeds a certain threshold (the insight), and then create an alert/notification.

Run a Outlier detection model on streaming heart rate that comes in. If the score is above a certain threshold, alert on the heart rate.

Flow Management

StreamProcessing

Page 22: Managing Data in Motion with the Connected Data … © Hortonworks Inc. 2011 – 2016. All Rights Reserved March 23-24, 2017 Doing It Right SYMPOSIUM Managing Data in Motion with the

22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Thank You

Page 23: Managing Data in Motion with the Connected Data … © Hortonworks Inc. 2011 – 2016. All Rights Reserved March 23-24, 2017 Doing It Right SYMPOSIUM Managing Data in Motion with the

23 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Capturestreaming data

Deliverperishable insights

Combinenew & old data

Storedata forever

Accessa multi-tenant data lake

Modelwith artificial intelligence

DATA AT REST(Hor tonwor ks Data P l at for m)

DATA IN MOTION(Hor tonwor ks DataF l ow)

ACTIONABLEINTELLIGENCE

Perishable Insights Historical Insights

Managing Data in Motion with the Connected Data Architecture

Page 24: Managing Data in Motion with the Connected Data … © Hortonworks Inc. 2011 – 2016. All Rights Reserved March 23-24, 2017 Doing It Right SYMPOSIUM Managing Data in Motion with the

24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Hortonworks DataFlow: Data in Motion PlatformHDF provides the flow management, stream processing, and enterprise services needed

to collect, curate, analyze and act on data-in-motion across the data center and cloud.

Page 25: Managing Data in Motion with the Connected Data … © Hortonworks Inc. 2011 – 2016. All Rights Reserved March 23-24, 2017 Doing It Right SYMPOSIUM Managing Data in Motion with the

25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Real-Time, Visual Control of Data Flows between Datacenter and the Cloud

Add and Adjust Data Sourcesto maximize the opportunity that you capture from perishable insights

Visually Trace the Data Pathto manage the what, who, where and how around data in motion

Dynamically Adjust the Pipelineto match the dataflow withyour bandwidth

H O R T O N W O R K S D ATA F L O W Add and adjust

data sources

Visually tracethe data path

Dynamically adjust the pipeline

Page 26: Managing Data in Motion with the Connected Data … © Hortonworks Inc. 2011 – 2016. All Rights Reserved March 23-24, 2017 Doing It Right SYMPOSIUM Managing Data in Motion with the

26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Integrated Processes and Control for Data Flows and Streaming Analytics

C O M M O N A R C H I T E C T U R EW I T H O U T H O R T O N W O R K S D A T A F L O W

W I T H H O R T O N W O R K S D A T A F L O W

Ingest

Scripts

Messaging

Scripts

HORTONWORKS DATAFLOW

Optimize the ArchitectureReduce cost an complexity with the most efficient data collection technologies

Assure Efficient OperationsVia real-time control of data inputs, outputs, transportation and transformations

Rely on a Common FoundationEliminating dependence on multiple customized systems

Page 27: Managing Data in Motion with the Connected Data … © Hortonworks Inc. 2011 – 2016. All Rights Reserved March 23-24, 2017 Doing It Right SYMPOSIUM Managing Data in Motion with the

27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Secure Flows with Chain of Custody and Provenance

H O R T O N W O R K S D ATA F L O W

End-to-End SecurityApply security rules to encrypt,decrypt, filter and replace data fromthe point of collection at the jaggededge to its final destination

Granular Control and SharingMove beyond role-based access and dynamically share an entire dataflow

Real-Time TraceabilityRich metadata and contextual detailhelps troubleshoot security issues and informs timely decisions

Page 28: Managing Data in Motion with the Connected Data … © Hortonworks Inc. 2011 – 2016. All Rights Reserved March 23-24, 2017 Doing It Right SYMPOSIUM Managing Data in Motion with the

28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Streaming Data Flows with Adaptive Control

AutomatedBi-directional communication between source and destination adapts data flows automatically, according to current priorities

On-Demand Operational control to adapt to changing conditions and requirements

ScalableBy incorporating data from any device—small machine sensors to enterprise data centers—HDF connects you to the broadest set of disparate data sources

Page 29: Managing Data in Motion with the Connected Data … © Hortonworks Inc. 2011 – 2016. All Rights Reserved March 23-24, 2017 Doing It Right SYMPOSIUM Managing Data in Motion with the

29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Data Acquisition & Delivery: First Mile Problem - Architecture

Server Cluster

DVR NiFi NiFi NiFi

Home Security

Client Libraries

Client Libraries

MiNiFi

MiNiFi

Regional Data CenterConsumer Devices

Server Cluster

NiFi NiFi NiFi

Core Data Center

Page 30: Managing Data in Motion with the Connected Data … © Hortonworks Inc. 2011 – 2016. All Rights Reserved March 23-24, 2017 Doing It Right SYMPOSIUM Managing Data in Motion with the

30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

T i e r 1 : R e g i o n a l D a t a C e n t e r( ~ 1 0 0 0 )

T i e r 2 : D i s t r i b u t i o n D a t a C e n t e r( ~ 1 0 0 )

T i e r 3 : N a t i o n a l D a t a C e n t e r( 3 - 6 )

Gateway

Server Cluster

DVR Kafka

Server Cluster

Splunk

Kafka

Storm/Spark/Flink/Apex

NiFi NiFi NiFi

MiNiFi

NiFi NiFi NiFi

Home Security

Client Libraries

Client Libraries

Router

Switch

Gateway Networking device

Storage Storage

MiNiFi

End to End Data in Motion Architecture

Streaming Analytics Manager