46
GuideTo New Features of Hortonworks DataFlow 2.0 Haimo Liu Product Manager Bryan Bende Sr. Software Engineer

Webinar Series Part 5 New Features of HDF 5

Embed Size (px)

Citation preview

Page 1: Webinar Series Part 5 New Features of HDF 5

GuideTo New Features of Hortonworks DataFlow 2.0

Haimo LiuProduct Manager

Bryan BendeSr. Software Engineer

Page 2: Webinar Series Part 5 New Features of HDF 5

2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Connected Data Platforms

Page 3: Webinar Series Part 5 New Features of HDF 5

3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Stream Processing

Flow Management

Enterprise Services

At the edge

Secu

rity

Visu

aliza

tion

On premises In the cloud

Registries/Catalogs Governance (Security/Compliance) Operations

HDF 2.0 – Data in Motion Platform

Page 4: Webinar Series Part 5 New Features of HDF 5

4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Flow Management Flow management + Stream Processing

D A T A I N M O T I O N D A T A A T R E S T

IoT Data Sources AWSAzure

Google CloudHadoop

NiFiKafka

Storm

Others…NiFi

NiFi NiFi

MiNiFi

MiNiFi

MiNiFi

MiNiFi

MiNiFi

MiNiFi

MiNiFi

NiFi

HDF 2.0 – Data in Motion Platform

Enterprise Services

Ambari Ranger Other services

Page 5: Webinar Series Part 5 New Features of HDF 5

5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Dataflow Management

Page 6: Webinar Series Part 5 New Features of HDF 5

6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Problems Today: Timely Access to Data and Decisions

http://diginomica.com/2016/04/22/royal-mail-starts-to-deliver-on-hortonworks-data-in-motion-promise

“HDF helps us to streamline the flowof data and build models andvisualisations quickly, so that my teamcan work iteratively with business colleagues on building solutionsthat work for the business.“

Royal Mail

Page 7: Webinar Series Part 5 New Features of HDF 5

7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

HDPHORTONWORKSDATA PLATFORMPowered by Apache Hadoop

HDF Makes Big Data Ingest EasyComplicated, messy, and takes weeks to

months to move the right data into HadoopStreamlined, Efficient, Easy

HDPHORTONWORKSDATA PLATFORMPowered by Apache Hadoop

Page 8: Webinar Series Part 5 New Features of HDF 5

8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Create a live dataflow in minutesHow would that change your business?

Page 9: Webinar Series Part 5 New Features of HDF 5

9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Add processor for data intake. Time: 1 minute1 Drag and drop processor from top menu

Page 10: Webinar Series Part 5 New Features of HDF 5

10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Choose the specific processor2 Choose one of the processors – currently 170+ available

Page 11: Webinar Series Part 5 New Features of HDF 5

11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Example: Pick Twitter Processor

Page 12: Webinar Series Part 5 New Features of HDF 5

12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Configure the processor. Time: 2 minutes3

4

Select processor and choose option to Configure

Adjust parameters as required

Page 13: Webinar Series Part 5 New Features of HDF 5

13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Another processor for data output. Time: 1 minute5

6 Filter for and select a “Put” processor

Drag and drop processor from top menu

Page 14: Webinar Series Part 5 New Features of HDF 5

14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Configure second processor. Time: 1 minute7 Configure 2nd processor

Page 15: Webinar Series Part 5 New Features of HDF 5

15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Connect processors, configure connection. 2 minutes

Configure Connection8

Note: Sample Flow is different from previous example of PutHDFS. This dataflow is PutFile. Same concepts apply.

Page 16: Webinar Series Part 5 New Features of HDF 5

16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Click Start to Begin Processing. Time total: 7 minutes

9 Click start “play” to being processing (will run continuously until you select stop)

Page 17: Webinar Series Part 5 New Features of HDF 5

17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

HDF 2.0: what’s new?

Page 18: Webinar Series Part 5 New Features of HDF 5

18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Challenges

Different devices

Globally distributed organization

Intelligence on the edge

Time to deliveryGetting the right data to

the right place at the right time is not trivial!

Page 19: Webinar Series Part 5 New Features of HDF 5

19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Challenges & NiFi

Different devices: different standards/protocols/formats

• Out of the box processors

• Intuitive GUI to combine processors and build ingestion pipeline

• Extensible framework, extremely easy to add a new source/protocolGlobally distributed organizationsIntelligence on the edgeTime to delivery

Support disparate, distributed systems

with easy drag & drop

Page 20: Webinar Series Part 5 New Features of HDF 5

20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Challenges & NiFi & HDF 2.0

Different devices: different standards/protocols/formats• Out of the box processors• Intuitive GUI to combine processors and build ingestion pipeline• Extensible framework, extremely easy to add a new source/protocol• Deeper ecosystem integration, 170+ processors in totalGlobally distributed organizationsIntelligence on the edgeTime to delivery Expanded ecosystem

Page 21: Webinar Series Part 5 New Features of HDF 5

21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

HDF 2.0 has 170+ Processors, 30% Increase from HDF 1.2

Hash

Extract

Merge

Duplicate

Scan

GeoEnrich

Replace

ConvertSplit

Translate

Route Content

Route Context

Route Text

Control Rate

Distribute Load

Generate Table Fetch

Jolt Transform JSON

Prioritized Delivery

Encrypt

Tail

Evaluate

Execute

HL7

FTP

UDP

XML

SFTP

HTTP

Syslog

Email

HTML

Image

AMQP

MQTT

All Apache project logos are trademarks of the ASF and the respective projects.

Fetch

Page 22: Webinar Series Part 5 New Features of HDF 5

22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Deeper Ecosystem Integration – New Processors

Processor Description

Publish/ConsumeKafka Two NARs, with kafka 0.9/0.10 client libraries, respectively

JoltTransformJson Manipulate JSON data on the fly, with a preview functionality

GenerateTableFetch Incremental fetch + parallel fetch against source table partitions

PutHiveQL Ingest to Hive tables

SelectHiveQL Select from Hive tables

PutHiveStreaming ingest streaming data to Hive, leverage Hive streaming API

CovertAvroToORC Format conversation, Avro to ORC

Publish/ConsumeMQTT MQTT is a popular protocol in IoT world

Page 23: Webinar Series Part 5 New Features of HDF 5

23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Challenges & NiFi & HDF 2.0

Different devices: different standards/protocols/formats• Out of the box processors• Intuitive GUI to combine processors and build ingestion pipeline• Extensible framework, extremely easy to add a new source/protocol• Deeper ecosystem integration, 170+ processors in total• Redesigned UI, refreshed user experienceGlobally distributed organizationsIntelligence on the edgeTime to delivery

More intuitive user interface

Page 24: Webinar Series Part 5 New Features of HDF 5

24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Modernized UI – Complete Interface Redesign

Page 25: Webinar Series Part 5 New Features of HDF 5

25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Challenges & NiFi

Different devicesGlobally distributed organizations: dataflow across multiple data centers

• Internal Site to Site communication, secured by 2-way SSL

• Environmental neutralIntelligence on the edgeTime to delivery Secure communications

across disparate, distributed systems

Page 26: Webinar Series Part 5 New Features of HDF 5

26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Challenges & NiFi & HDF 2.0

Different devicesGlobally distributed organizations: dataflow across multiple data centers• Internal Site to Site communication, secured by 2-way SSL• Environmental neutral• Variable registryIntelligence on the edgeTime to delivery

Simplifies flow provisioning

Page 27: Webinar Series Part 5 New Features of HDF 5

27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Variable Registry

Variable registry

– To automatically resolve environmental specific values

• Example: connection string

• The same key referenced in a template, can be mapped to different values

in DEV vs PROD

– In-memory variable registry

Page 28: Webinar Series Part 5 New Features of HDF 5

28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Challenges & NiFi & HDF 2.0

Different devicesGlobally distributed organizations: dataflow across multiple data centers• Internal Site-to-Site communication, secured by 2-way SSL• Environmental neutral• Variable registry• Better deployment management, Apache Ambari integrationIntelligence on the edgeTime to delivery Simplified operations in

distributed environments

Page 29: Webinar Series Part 5 New Features of HDF 5

29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Ambari Integration

NiFi cluster management– Start/stop NiFi service

– Centralized place for managing config files

Ambari to display NiFi metrics

Ambari to manage kerberos authentication

Ambari-NiFi Integration

Automated deployment by Ambari

Manual RPM deployment

Tar.gz/zip deployment (NIFI/MINIFI Java)

Tar.gz for most Linux/Mac, compile your own for other OS (MINIFI C++)

HDF 2.0 Deployment Model

Page 30: Webinar Series Part 5 New Features of HDF 5

30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Challenges & NiFi & HDF 2.0

Different devicesGlobally distributed organizations: dataflow across multiple data centers• Internal Site to Site communication, secured by 2-way SSL• Environmental neutral• Variable registry• Better deployment management, Apache Ambari integration• Enhanced Site to Site communicationIntelligence on the edgeTime to delivery

Modularized s2s to support pluggable protocols

Page 31: Webinar Series Part 5 New Features of HDF 5

31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Challenges & NiFi

Different devices, Globally distributed organizationsIntelligence on the edge: analytics on resource constrained devices

• Run single node on the edge, communicating back via S2S

• Bi-directional communicationTime to delivery

Analytics at the Edge

Page 32: Webinar Series Part 5 New Features of HDF 5

32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Challenges & NiFi & HDF 2.0

Different devices, Globally distributed organizationsIntelligence on the edge: analytics on resource constrained devices• Run single node on the edge, communicating back via Site to Site protocol• Bi-directional communication

• Apache MiNiFi, bi-directional command and control on the edgeTime to delivery

Edge Intelligence for the

first mile

Page 33: Webinar Series Part 5 New Features of HDF 5

33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Edge Intelligence with Apache MiNiFi

Guaranteed delivery Data buffering

‒ Backpressure‒ Pressure release

Prioritized queuing Flow specific QoS

‒ Latency vs. throughput‒ Loss tolerance

Data provenance

Recovery / recording a rolling log of fine-grained history

Designed for extension

Different from Apache NiFi Design and Deploy Warm re-deploys

Key Features

Page 34: Webinar Series Part 5 New Features of HDF 5

34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

NiFi vs. MiNiFi Java Agent

NiFi Framework

Components

MiNiFi

NiFi Framework

User Interface

Components

NiFi

Page 35: Webinar Series Part 5 New Features of HDF 5

35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Challenges & NiFi

Different devices, Globally distributed organizations, Intelligence on the edgeTime to delivery: need an application, out of the box solution• Data provenance, traceability and compliance issues• Flow visibility, big picture of the enterprise dataflow• Automatic failure handling

FAST AND EASY To get results, tune and

change dataflows

Page 36: Webinar Series Part 5 New Features of HDF 5

36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Challenges & NiFi & HDF 2.0

Different devices, Globally distributed organizations, Intelligence on the edgeTime to delivery: need an application, out of the box solution• Data provenance, traceability and compliance issues• Flow visibility, big picture of the enterprise dataflow• Automatic failure handling• Control plane high availability, zero-master clustering

High availability

Page 37: Webinar Series Part 5 New Features of HDF 5

37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Zero-master Clustering

New clustering paradigm

Zero-master clustering– Multiple entry points, no master node, no single point of failure

– Auto-elected cluster coordinator for cluster maintenance

– Automatic failover handling

HDF 2.0 (NiFi 1.0.0)

Page 38: Webinar Series Part 5 New Features of HDF 5

38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Zero-master Clustering

Page 39: Webinar Series Part 5 New Features of HDF 5

39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Zero-master Clustering

Heartbeat messages (every 5s by default)

Node status: connecting/connected/disconnecting/disconnected

Page 40: Webinar Series Part 5 New Features of HDF 5

40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Zero-master Clustering

Page 41: Webinar Series Part 5 New Features of HDF 5

41 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Challenges & NiFi & HDF 2.0

Different devices, Globally distributed organizations, Intelligence on the edgeTime to delivery: need an application, out of the box solution• Data provenance, traceability and compliance issues• Flow visibility, big picture of the enterprise dataflow• Automatic failure handling• Control plane high availability, zero-master clustering• Multi-tenancy flow editing, and authorization

Secured enterprise wide collaboration

Page 42: Webinar Series Part 5 New Features of HDF 5

42 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Multi-tenant Flow Editing

Multi-tenant flow editing– Self-service collaborative model, google-doc type user experience

– Multiple teams making edits to different processors at the same time

– Only the component being modified is locked, not the entire flow

– Scalable model to speed up flow editing

HDF 2.0 (NiFi 1.0.0)

Page 43: Webinar Series Part 5 New Features of HDF 5

43 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Multi-tenant Authorization

Component level authorization– New authorizer API

– “Read” and “Write” permissions

– Protection against unauthorized usage without losing context

Authorization management– Internal management (NIFI)

– External management (Ranger, etc.)

HDF 2.0 (NiFi 1.0.0)

Page 44: Webinar Series Part 5 New Features of HDF 5

44 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Multi-tenant Authorization

Read PermissionProcessor name

visible

Processor configuration visible

Page 45: Webinar Series Part 5 New Features of HDF 5

45 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Multi-tenant AuthorizationNO Read Permission

Processor name & configuration invisible (content)

Statistics visible (context)

Page 46: Webinar Series Part 5 New Features of HDF 5

46 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Questions?

Hortonworks Community Connection:Data Ingestion and Streaminghttps://community.hortonworks.com/