Upload
hortonworks
View
1.881
Download
2
Embed Size (px)
Citation preview
GuideTo New Features of Hortonworks DataFlow 2.0
Haimo LiuProduct Manager
Bryan BendeSr. Software Engineer
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Connected Data Platforms
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Stream Processing
Flow Management
Enterprise Services
At the edge
Secu
rity
Visu
aliza
tion
On premises In the cloud
Registries/Catalogs Governance (Security/Compliance) Operations
HDF 2.0 – Data in Motion Platform
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Flow Management Flow management + Stream Processing
D A T A I N M O T I O N D A T A A T R E S T
IoT Data Sources AWSAzure
Google CloudHadoop
NiFiKafka
Storm
Others…NiFi
NiFi NiFi
MiNiFi
MiNiFi
MiNiFi
MiNiFi
MiNiFi
MiNiFi
MiNiFi
NiFi
HDF 2.0 – Data in Motion Platform
Enterprise Services
Ambari Ranger Other services
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Dataflow Management
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Problems Today: Timely Access to Data and Decisions
http://diginomica.com/2016/04/22/royal-mail-starts-to-deliver-on-hortonworks-data-in-motion-promise
“HDF helps us to streamline the flowof data and build models andvisualisations quickly, so that my teamcan work iteratively with business colleagues on building solutionsthat work for the business.“
Royal Mail
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDPHORTONWORKSDATA PLATFORMPowered by Apache Hadoop
HDF Makes Big Data Ingest EasyComplicated, messy, and takes weeks to
months to move the right data into HadoopStreamlined, Efficient, Easy
HDPHORTONWORKSDATA PLATFORMPowered by Apache Hadoop
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Create a live dataflow in minutesHow would that change your business?
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Add processor for data intake. Time: 1 minute1 Drag and drop processor from top menu
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Choose the specific processor2 Choose one of the processors – currently 170+ available
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Example: Pick Twitter Processor
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Configure the processor. Time: 2 minutes3
4
Select processor and choose option to Configure
Adjust parameters as required
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Another processor for data output. Time: 1 minute5
6 Filter for and select a “Put” processor
Drag and drop processor from top menu
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Configure second processor. Time: 1 minute7 Configure 2nd processor
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Connect processors, configure connection. 2 minutes
Configure Connection8
Note: Sample Flow is different from previous example of PutHDFS. This dataflow is PutFile. Same concepts apply.
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Click Start to Begin Processing. Time total: 7 minutes
9 Click start “play” to being processing (will run continuously until you select stop)
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDF 2.0: what’s new?
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Challenges
Different devices
Globally distributed organization
Intelligence on the edge
Time to deliveryGetting the right data to
the right place at the right time is not trivial!
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Challenges & NiFi
Different devices: different standards/protocols/formats
• Out of the box processors
• Intuitive GUI to combine processors and build ingestion pipeline
• Extensible framework, extremely easy to add a new source/protocolGlobally distributed organizationsIntelligence on the edgeTime to delivery
Support disparate, distributed systems
with easy drag & drop
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Challenges & NiFi & HDF 2.0
Different devices: different standards/protocols/formats• Out of the box processors• Intuitive GUI to combine processors and build ingestion pipeline• Extensible framework, extremely easy to add a new source/protocol• Deeper ecosystem integration, 170+ processors in totalGlobally distributed organizationsIntelligence on the edgeTime to delivery Expanded ecosystem
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDF 2.0 has 170+ Processors, 30% Increase from HDF 1.2
Hash
Extract
Merge
Duplicate
Scan
GeoEnrich
Replace
ConvertSplit
Translate
Route Content
Route Context
Route Text
Control Rate
Distribute Load
Generate Table Fetch
Jolt Transform JSON
Prioritized Delivery
Encrypt
Tail
Evaluate
Execute
HL7
FTP
UDP
XML
SFTP
HTTP
Syslog
HTML
Image
AMQP
MQTT
All Apache project logos are trademarks of the ASF and the respective projects.
Fetch
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Deeper Ecosystem Integration – New Processors
Processor Description
Publish/ConsumeKafka Two NARs, with kafka 0.9/0.10 client libraries, respectively
JoltTransformJson Manipulate JSON data on the fly, with a preview functionality
GenerateTableFetch Incremental fetch + parallel fetch against source table partitions
PutHiveQL Ingest to Hive tables
SelectHiveQL Select from Hive tables
PutHiveStreaming ingest streaming data to Hive, leverage Hive streaming API
CovertAvroToORC Format conversation, Avro to ORC
Publish/ConsumeMQTT MQTT is a popular protocol in IoT world
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Challenges & NiFi & HDF 2.0
Different devices: different standards/protocols/formats• Out of the box processors• Intuitive GUI to combine processors and build ingestion pipeline• Extensible framework, extremely easy to add a new source/protocol• Deeper ecosystem integration, 170+ processors in total• Redesigned UI, refreshed user experienceGlobally distributed organizationsIntelligence on the edgeTime to delivery
More intuitive user interface
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Modernized UI – Complete Interface Redesign
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Challenges & NiFi
Different devicesGlobally distributed organizations: dataflow across multiple data centers
• Internal Site to Site communication, secured by 2-way SSL
• Environmental neutralIntelligence on the edgeTime to delivery Secure communications
across disparate, distributed systems
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Challenges & NiFi & HDF 2.0
Different devicesGlobally distributed organizations: dataflow across multiple data centers• Internal Site to Site communication, secured by 2-way SSL• Environmental neutral• Variable registryIntelligence on the edgeTime to delivery
Simplifies flow provisioning
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Variable Registry
Variable registry
– To automatically resolve environmental specific values
• Example: connection string
• The same key referenced in a template, can be mapped to different values
in DEV vs PROD
– In-memory variable registry
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Challenges & NiFi & HDF 2.0
Different devicesGlobally distributed organizations: dataflow across multiple data centers• Internal Site-to-Site communication, secured by 2-way SSL• Environmental neutral• Variable registry• Better deployment management, Apache Ambari integrationIntelligence on the edgeTime to delivery Simplified operations in
distributed environments
29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ambari Integration
NiFi cluster management– Start/stop NiFi service
– Centralized place for managing config files
Ambari to display NiFi metrics
Ambari to manage kerberos authentication
Ambari-NiFi Integration
Automated deployment by Ambari
Manual RPM deployment
Tar.gz/zip deployment (NIFI/MINIFI Java)
Tar.gz for most Linux/Mac, compile your own for other OS (MINIFI C++)
HDF 2.0 Deployment Model
30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Challenges & NiFi & HDF 2.0
Different devicesGlobally distributed organizations: dataflow across multiple data centers• Internal Site to Site communication, secured by 2-way SSL• Environmental neutral• Variable registry• Better deployment management, Apache Ambari integration• Enhanced Site to Site communicationIntelligence on the edgeTime to delivery
Modularized s2s to support pluggable protocols
31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Challenges & NiFi
Different devices, Globally distributed organizationsIntelligence on the edge: analytics on resource constrained devices
• Run single node on the edge, communicating back via S2S
• Bi-directional communicationTime to delivery
Analytics at the Edge
32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Challenges & NiFi & HDF 2.0
Different devices, Globally distributed organizationsIntelligence on the edge: analytics on resource constrained devices• Run single node on the edge, communicating back via Site to Site protocol• Bi-directional communication
• Apache MiNiFi, bi-directional command and control on the edgeTime to delivery
Edge Intelligence for the
first mile
33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Edge Intelligence with Apache MiNiFi
Guaranteed delivery Data buffering
‒ Backpressure‒ Pressure release
Prioritized queuing Flow specific QoS
‒ Latency vs. throughput‒ Loss tolerance
Data provenance
Recovery / recording a rolling log of fine-grained history
Designed for extension
Different from Apache NiFi Design and Deploy Warm re-deploys
Key Features
34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi vs. MiNiFi Java Agent
NiFi Framework
Components
MiNiFi
NiFi Framework
User Interface
Components
NiFi
35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Challenges & NiFi
Different devices, Globally distributed organizations, Intelligence on the edgeTime to delivery: need an application, out of the box solution• Data provenance, traceability and compliance issues• Flow visibility, big picture of the enterprise dataflow• Automatic failure handling
FAST AND EASY To get results, tune and
change dataflows
36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Challenges & NiFi & HDF 2.0
Different devices, Globally distributed organizations, Intelligence on the edgeTime to delivery: need an application, out of the box solution• Data provenance, traceability and compliance issues• Flow visibility, big picture of the enterprise dataflow• Automatic failure handling• Control plane high availability, zero-master clustering
High availability
37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Zero-master Clustering
New clustering paradigm
Zero-master clustering– Multiple entry points, no master node, no single point of failure
– Auto-elected cluster coordinator for cluster maintenance
– Automatic failover handling
HDF 2.0 (NiFi 1.0.0)
38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Zero-master Clustering
39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Zero-master Clustering
Heartbeat messages (every 5s by default)
Node status: connecting/connected/disconnecting/disconnected
40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Zero-master Clustering
41 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Challenges & NiFi & HDF 2.0
Different devices, Globally distributed organizations, Intelligence on the edgeTime to delivery: need an application, out of the box solution• Data provenance, traceability and compliance issues• Flow visibility, big picture of the enterprise dataflow• Automatic failure handling• Control plane high availability, zero-master clustering• Multi-tenancy flow editing, and authorization
Secured enterprise wide collaboration
42 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Multi-tenant Flow Editing
Multi-tenant flow editing– Self-service collaborative model, google-doc type user experience
– Multiple teams making edits to different processors at the same time
– Only the component being modified is locked, not the entire flow
– Scalable model to speed up flow editing
HDF 2.0 (NiFi 1.0.0)
43 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Multi-tenant Authorization
Component level authorization– New authorizer API
– “Read” and “Write” permissions
– Protection against unauthorized usage without losing context
Authorization management– Internal management (NIFI)
– External management (Ranger, etc.)
HDF 2.0 (NiFi 1.0.0)
44 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Multi-tenant Authorization
Read PermissionProcessor name
visible
Processor configuration visible
45 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Multi-tenant AuthorizationNO Read Permission
Processor name & configuration invisible (content)
Statistics visible (context)
46 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Questions?
Hortonworks Community Connection:Data Ingestion and Streaminghttps://community.hortonworks.com/