20
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hortonworks DataFlow Overview and Scaling Architectures Keith Manthey CTO Dell EMC Isilon Anna Yong Product Marketing Hortonworks

Scaling real time streaming architectures with HDF and Dell EMC Isilon

Embed Size (px)

Citation preview

Page 1: Scaling real time streaming architectures with HDF and Dell EMC Isilon

1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Hortonworks DataFlowOverview and Scaling Architectures

Keith MantheyCTO Dell EMC Isilon

Anna YongProduct Marketing Hortonworks

Page 2: Scaling real time streaming architectures with HDF and Dell EMC Isilon

2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Constrained High-latency Localized context

Hybrid – cloud / on-premises Low-latency Global context

CoreInfrastructure

Harnessing Data in Motion

RegionalInfrastructureSources

Page 3: Scaling real time streaming architectures with HDF and Dell EMC Isilon

3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Hortonworks DataFlow Manages Data in Motion

CoreInfrastructureSources

Constrained High-latency Localized context

Hybrid – cloud / on-premises Low-latency Global context

RegionalInfrastructure

Page 4: Scaling real time streaming architectures with HDF and Dell EMC Isilon

4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Requirements for Data in Motion

Perishable Insights

ConnectivityPrioritization

Security

AdaptabilityExtensibility

Scalability

Provenance

Real-Time

Page 5: Scaling real time streaming architectures with HDF and Dell EMC Isilon

5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Connecting Data Between Ecosystems Without Coding: 170+ Processors

Hash

Extract

Merge

Duplicate

Scan

GeoEnrich

Replace

ConvertSplit

Translate

Route Content

Route Context

Route Text

Control Rate

Distribute Load

Generate Table Fetch

Jolt Transform JSON

Prioritized Delivery

Encrypt

Tail

Evaluate

Execute

HL7

FTP

UDP

XML

SFTP

HTTP

Syslog

Email

HTML

Image

AMQP

MQTT

All Apache project logos are trademarks of the ASF and the respective projects.

Fetch

Page 6: Scaling real time streaming architectures with HDF and Dell EMC Isilon

6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache NiFi

Guaranteed delivery Data buffering

‒ Backpressure‒ Pressure release

Prioritized queuing Flow specific QoS

‒ Latency vs. throughput‒ Loss tolerance

Data provenance

Recovery / recording a rolling log of fine-grained history

Designed for extension Visual command and control Flow templates Policy based security Clustering

Key Features

Page 7: Scaling real time streaming architectures with HDF and Dell EMC Isilon

Scaling ArchitecturesKeith Manthey

Page 8: Scaling real time streaming architectures with HDF and Dell EMC Isilon

8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

ISILON SCALE-OUT ARCHITECTURE

Gig-e40 Gig-e Network

Web

Apps

Cloud

AnalyticsArchive

Linux/Unix

Mac/iOS

Windows

Page 9: Scaling real time streaming architectures with HDF and Dell EMC Isilon

9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Simple to manage Single file system, single volume, global namespace

Massively scalable Scales from 16 TB to over 50 PB in a single cluster

200GB/s throughput, 3.75M IOPS

Unmatched efficiencyOver 80% storage utilization, automated tiering and SmartDedupe

Enterprise data protectionEfficient backup and disaster recovery, and N+1 thru N+4 redundancy

Robust security and compliance optionsRBAC, Access Zones, WORM data security, File System AuditingData At Rest Encryption with SEDs, STIG hardening

CAC/PIV Smartcard authentication, FIPS OpenSSL support

Operational flexibilityMulti-protocol support including NFS, SMB, HTTP, FTP and HDFSObject and Cloud computing including OpenStack Swift

Isilon Scale-Out NAS

Page 10: Scaling real time streaming architectures with HDF and Dell EMC Isilon

10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Scalable Architecture

Speed Layer

Serving Layer

Batch Layer

Events

Patternstshark

tshark

Page 11: Scaling real time streaming architectures with HDF and Dell EMC Isilon

11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

EMC ISILON ALL-FLASH STRATEGYEMC WORLD 2016 TECHNOLOGY PREVIEW – PROJECT NITRO

• A new bladed architecture for Isilon with all-Flash storage– Supporting very high density flash modules– Higher throughput up to 15GBps per chassis– Higher transactions up to 250,000 OPS per chassis– Lower latency – 10X improvement on current platform– Up to 8 x 40GbE ports for front-end IO per chassis

Page 12: Scaling real time streaming architectures with HDF and Dell EMC Isilon

12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

EMC ISILON ALL-FLASH STRATEGYEMC WORLD 2016 TECHNOLOGY PREVIEW – PROJECT NITRO

• Huge scale large capacity clusters supporting 400+ nodes

• Built on a mature & trusted scale-out clustered file system

• Rich snapshots, replication, cloud-tiering

• Multi-protocol – nfs 3/4, smb 2/3, hdfs, ftp, object, http

Page 13: Scaling real time streaming architectures with HDF and Dell EMC Isilon

13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

COMPONENTSHDP 2.5

Page 14: Scaling real time streaming architectures with HDF and Dell EMC Isilon

14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

COMPONENTSONEFS

Page 15: Scaling real time streaming architectures with HDF and Dell EMC Isilon

15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

COMPONENTSONEFS

Page 16: Scaling real time streaming architectures with HDF and Dell EMC Isilon

16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

NETWORK CAPTURETSHARK

0.672623546 10.111.156.223 -> 10.111.158.181 NFS 254 V4 Call GETATTR FH:0x1005762e0.672884404 10.111.158.181 -> 10.111.156.223 NFS 290 V4 Reply (Call In 5) GETATTR0.679893456 10.111.156.223 -> 10.111.158.181 NFS 342 V4 Call OPEN DH:0x1005762e/netflow.pcap0.681128189 10.111.158.181 -> 10.111.156.223 NFS 430 V4 Reply (Call In 7) OPEN StateID:0x6b180.681373159 10.111.156.223 -> 10.111.158.181 NFS 278 V4 Call CLOSE StateID:0x6b180.682575147 10.111.158.181 -> 10.111.156.223 NFS 202 V4 Reply (Call In 9) CLOSE0.700039578 10.111.156.223 -> 10.111.158.181 NFS 282 V4 Call LOOKUP DH:0x5e873906/netflow1.pcap0.700935083 10.111.158.181 -> 10.111.156.223 NFS 122 V4 Reply (Call In 11) LOOKUP Status: NFS4ERR_NOENT0.701083442 10.111.156.223 -> 10.111.158.181 NFS 378 V4 Call OPEN DH:0x5e873906/.netflow1.pcap0.704327778 10.111.158.181 -> 10.111.156.223 NFS 430 V4 Reply (Call In 13) OPEN StateID:0xe1c80.704404916 10.111.156.223 -> 10.111.158.181 NFS 278 V4 Call CLOSE StateID:0xe1c80.705417643 10.111.158.181 -> 10.111.156.223 NFS 202 V4 Reply (Call In 15) CLOSE0.705540934 10.111.156.223 -> 10.111.158.181 NFS 254 V4 Call GETATTR FH:0x5e8739060.705853927 10.111.158.181 -> 10.111.156.223 NFS 290 V4 Reply (Call In 17) GETATTR0.705884552 10.111.156.223 -> 10.111.158.181 NFS 282 V4 Call LOOKUP DH:0x5e873906/.netflow1.pcap0.706171855 10.111.158.181 -> 10.111.156.223 NFS 366 V4 Reply (Call In 19) LOOKUP0.706204441 10.111.156.223 -> 10.111.158.181 NFS 262 V4 Call ACCESS FH:0x5e873906, [Check: RD LU MD XT DL]0.706412166 10.111.158.181 -> 10.111.156.223 NFS 306 V4 Reply (Call In 21) ACCESS, [Allowed: RD LU MD XT DL]

Page 17: Scaling real time streaming architectures with HDF and Dell EMC Isilon

17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

NIFI

Page 18: Scaling real time streaming architectures with HDF and Dell EMC Isilon

18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Questions?

Hortonworks Community Connection:Data Ingestion and Streaminghttps://community.hortonworks.com/

Page 19: Scaling real time streaming architectures with HDF and Dell EMC Isilon
Page 20: Scaling real time streaming architectures with HDF and Dell EMC Isilon

20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Thank you!