36
© Hortonworks Inc. 2014 Hortonworks: We Do Hadoop “State of the Union” Webinar Shaun Connolly, VP Strategy @shaunconnolly, @hortonworks January 22, 2014 Page 1

Enterprise Apache Hadoop: State of the Union

Embed Size (px)

DESCRIPTION

So what's in store for 2014? This deck was from Shaun Connolly's (VP of Strategy, Hortonworks) State of the Union webinar. In this deck, you'll find: - Reflection on Enterprise Hadoop Market in 2013 - The latest releases and innovations within the open source community - Highlights of what's in store for Apache Hadoop and Big Data in 2014

Citation preview

Page 1: Enterprise Apache Hadoop: State of the Union

© Hortonworks Inc. 2014

Hortonworks: We Do Hadoop “State of the Union” Webinar

Shaun Connolly, VP Strategy @shaunconnolly, @hortonworks January 22, 2014

Page 1

Page 2: Enterprise Apache Hadoop: State of the Union

© Hortonworks Inc. 2014

Today’s Webinar

Page 2

• Apache Hadoop & Hortonworks Overview • Hadoop’s Role • Hadoop Adoption: From Apps to Lake • Enterprise Hadoop Technology Directions

Page 3: Enterprise Apache Hadoop: State of the Union

© Hortonworks Inc. 2014

Our Mission:

Our Commitment

Open Leadership Drive innovation in the open exclusively via the Apache community-driven open source process

Enterprise Rigor Engineer, test and certify Apache Hadoop with the enterprise in mind

Ecosystem Endorsement Focus on deep integration with existing data center technologies and skills

Headquarters: Palo Alto, CA Employees: 300+ and growing

Reseller Partners

Enable your Modern Data Architecture by Delivering Enterprise Apache Hadoop

Page 3

Our Vision: More than Half the World's Data Will Be Processed by Apache Hadoop

Page 4: Enterprise Apache Hadoop: State of the Union

© Hortonworks Inc. 2014 Page 4

Apache Software Foundation Guiding Principles •  Release early & often •  Transparency, respect, meritocracy

Key Roles •  PMC Members

–  Managing community projects –  Mentoring new incubator projects

•  Committers –  Authoring, reviewing & editing code

•  Release Managers –  Testing & releasing projects

Apache Community Process Apache Community Projects

Release Apache Hadoop

Test & Patch

Design & Develop

Apache HBase

Apache

Hive

Apache Falcon

Apache Pig

Apache Ambari

Apache Storm

Page 5: Enterprise Apache Hadoop: State of the Union

© Hortonworks Inc. 2014

Hortonworks Process for Enterprise Hadoop

Page 5

Upstream Community Projects Downstream Enterprise Product

HDP 2.0

Distribute

Integrate & Test

Package & Certify

Release Apache Hadoop

Test & Patch

Design & Develop

Virtuous cycle when development & fixed issues done upstream & stable project releases flow downstream

Stable Project Releases

Fixed Issues

Apache HBase

Apache

Hive

Apache Falcon

Apache Pig

Apache Ambari

Apache Storm

•  1000’s of production nodes at Yahoo! •  Over 1500 unit & system tests

Design & Develop

Certified at scale using the most advanced Hadoop test bed on the planet

Page 6: Enterprise Apache Hadoop: State of the Union

© Hortonworks Inc. 2014

Hadoop’s Role…

“Hadoop is becoming a more ‘normal’ software market” and the “Hadoop vendor ecosystem [is] gaining critical mass”

Tony Baer, Ovum

Page 6

Page 7: Enterprise Apache Hadoop: State of the Union

© Hortonworks Inc. 2014

A Traditional Approach Under Pressure

Page 7

APPLICAT

IONS  

DATA

   SYSTEM  

REPOSITORIES  

SOURC

ES  

Exis4ng  Sources    (CRM,  ERP,  Clickstream,  Logs)  

RDBMS   EDW   MPP  

Emerging  Sources    (Sensor,  Sen4ment,  Geo,  Unstructured)  

Business    Analy4cs  

Custom  Applica4ons  

Packaged  Applica4ons  

Source: IDC

2.8  ZB  in  2012  

85%  from  New  Data  Types  

15x  Machine  Data  by  2020  

40  ZB  by  2020  

Page 8: Enterprise Apache Hadoop: State of the Union

© Hortonworks Inc. 2014

Unlock Value in New Types of Data 1.  Social

Understand how people are feeling and interacting – right now

2.  Clickstream Capture and analyze website visitors’ data trails and optimize your website

3.  Sensor/Machine Discover patterns in data streaming from remote sensors and machines

4.  Geographic Analyze location-based data to manage operations where they occur

5.  Server Logs Diagnose process failures and prevent security breaches

6.  Unstructured (txt, video, pictures, etc..) Understand patterns in files across millions of web pages, emails, and documents

Value

Page 8

+ Online archive Data that was once purged or moved to tape can be stored in Hadoop to discover long term trends and previously hidden value

Page 9: Enterprise Apache Hadoop: State of the Union

© Hortonworks Inc. 2014

• Complement  Data  Systems  • Right  Workload  Right  Place  

A Modern Data Architecture Enabled

Page 9

APPLICAT

IONS  

DATA

   SYSTEM  

REPOSITORIES  

SOURC

ES  

Exis4ng  Sources    (CRM,  ERP,  Clickstream,  Logs)  

RDBMS   EDW   MPP  

Emerging  Sources    (Sensor,  Sen4ment,  Geo,  Unstructured)  

Business    Analy4cs  

Custom  Applica4ons  

Packaged  Applica4ons  

Page 10: Enterprise Apache Hadoop: State of the Union

© Hortonworks Inc. 2014

A Modern Data Architecture Applied

Page 10

APPLICAT

IONS  

DATA

 SYSTEM  

SOURC

ES  

RDBMS   EDW   MPP  

Emerging  Sources    (Sensor,  Sen4ment,  Geo,  Unstructured)  

HANA

BusinessObjects BI

OPERATIONAL  TOOLS  

DEV  &  DATA  TOOLS  

Exis4ng  Sources    (CRM,  ERP,  Clickstream,  Logs)  

INFRASTRUCTURE  

Page 11: Enterprise Apache Hadoop: State of the Union

© Hortonworks Inc. 2014

UDA  Diagram  

Major Vendors Have Embraced Hadoop

Teradata Portfolio for Hadoop

•  Seamless data access between Teradata and Hadoop (SQL-H)

•  Simple management & monitoring with Viewpoint integration

•  Flexible deployment options

Page 11

HDInsight & HDP for Windows

•  Only Hadoop Distribution for Windows Azure & Windows Server

•  Native integration with SQL Server, Excel, and System Center

•  Extends Hadoop to .NET community

Complete Portfolio for Hadoop  

Appliances

Instant Access + Infinite Scale

•  SAP can assure their customers they are deploying an SAP HANA + Hadoop architecture fully supported by SAP

•  Enables analytics apps (BOBJ) to interact with Hadoop

Page 12: Enterprise Apache Hadoop: State of the Union

© Hortonworks Inc. 2014

Hadoop Adoption

“Hadoop’s momentum is unstoppable as its open source roots grow wildly into enterprises. Its refreshingly unique approach to data management is transforming how companies store, process, analyze, and share big data”

--Mike Gualtieri, Forrester

Page 12

Page 13: Enterprise Apache Hadoop: State of the Union

© Hortonworks Inc. 2014

SC

ALE

SCOPE

New Analytic Apps New Types of Data LOB Driven

Drivers of Hadoop Adoption

Page 13

Page 14: Enterprise Apache Hadoop: State of the Union

© Hortonworks Inc. 2014

20 Common Business Applications

Industry Use Case Type of Data

Financial Services New Account Risk Screens Text, Server Logs

Trading Risk Server Logs

Insurance Underwriting Geographic, Sensor, Text

Telecom Call Detail Records (CDRs) Machine, Geographic

Infrastructure Investment Machine, Server Logs

Real-time Bandwidth Allocation Server Logs, Text, Social

Retail 360° View of the Customer Clickstream, Text

Localized, Personalized Promotions Geographic

Website Optimization Clickstream

Manufacturing Supply Chain and Logistics Sensor

Assembly Line Quality Assurance Sensor

Crowdsourced Quality Assurance Social

Healthcare Use Genomic Data in Medical Trials Structured

Monitor Patient Vitals in Real-Time Sensor

Pharmaceuticals Recruit and Retain Patients for Drug Trials Social, Clickstream

Improve Prescription Adherence Social, Unstructured, Geographic

Oil & Gas Unify Exploration & Production Data Sensor, Geographic & Unstructured

Monitor Rig Safety in Real-Time Sensor, Unstructured

Government ETL Offload in Response to Federal Budgetary Pressures Structured

Sentiment Analysis for Government Programs Social

Page 14

Page 15: Enterprise Apache Hadoop: State of the Union

© Hortonworks Inc. 2014

New Analytic Apps New Types of Data LOB Driven

SALES CANVAS Drivers of Hadoop Adoption

Page 15

More data and analytic apps

MDA/Data Lake Cost, Insight IT Driven

SC

ALE

SCOPE

Page 16: Enterprise Apache Hadoop: State of the Union

© Hortonworks Inc. 2014

The Journey Towards a Data Lake D

ATA

VALUE

Risk Management E.g., Fraud Reduction

Operational Excellence E.g., Network Maintenance

New Business E.g., Data as a Product

Customer Intimacy E.g., 360 Degree View

of the Customer

TB’s

P

B

PB

’s

Page 16

DATA LAKE An architectural shift in the

data center that uses Hadoop to deliver deep insight across a

large, broad, diverse set of data at efficient scale

Page 17: Enterprise Apache Hadoop: State of the Union

© Hortonworks Inc. 2014

DATA

 LAK

E  

•  Acquire all data in original format and store in one place, cost effectively and for an unlimited time

•  Scale horizontally and to petabyte scale

Drivers of the Data Lake

 Data      Access  

Access your data simultaneously in multiple ways Irrespective of the processing engine, analytical application or presentation

+  Hadoop  =  INSIGHT

+  Hadoop  =  SCALE

•  Allows simultaneous access by and timely insights for all your users across all your data

•  Enabled schema on read & enterprise-wide pool of data

 BROAD  INSIGHT  Data  Access  

Access  your  data  simultaneously  in  mul4ple  ways  

 EFFICIENT  SCALE  Data  Management  

Store  and  process  all  of  your  Corporate  Data  Assets  

Page 17

Page 18: Enterprise Apache Hadoop: State of the Union

© Hortonworks Inc. 2014

Data Lake Transforms Your Architecture

DATA

 LAK

E  SO

URC

ES  

Exis4ng  Sources    (CRM,  ERP,  Clickstream,  Logs)  

Emerging  Sources    (Sensor,  Sen4ment,  Geo,  Unstructured)  

APPLICAT

IONS  

Business    Analy4cs  

Custom  Applica4ons  

Packaged  Applica4ons  

 BROAD  INSIGHT  Data  Access  

Access  your  data  simultaneously  in  mul4ple  ways  

 EFFICIENT  SCALE  Data  Management  

Store  and  process  all  of  your  Corporate  Data  Assets  

Page 18

Page 19: Enterprise Apache Hadoop: State of the Union

© Hortonworks Inc. 2014

Enterprise Hadoop Technology Directions

“With Hadoop 2.0 we expect this ecosystem to grow like bamboo in spring time.”

Robin Bloor, The Bloor Group

Page 19

Page 20: Enterprise Apache Hadoop: State of the Union

© Hortonworks Inc. 2014

OS/VM   Cloud   Appliance  

What’s Needed for Enterprise Hadoop?

Page 20

CORE    SERVICES  

Enterprise Readiness High Availability, Disaster Recovery, Rolling Upgrades, Security and Snapshots

OPERATIONAL  SERVICES  

HDFS  

SQOOP  

FLUME  

NFS  

WebHDFS  

KNOX*  

OOZIE  

AMBARI  

FALCON*  

YARN      

MAP       TEZ  REDUCE  

HIVE  &  HCATALOG  PIG  HBASE  

OPERATIONAL  SERVICES  

DATA  SERVICES  

CORE  SERVICES  

Schedule  

Enterprise Readiness High Availability, Disaster Recovery, Rolling Upgrades, Security and Snapshots

Storage  

Resource  Management  

Process  

Data  Movement  

Cluster  Mgmt   Dataset  

Mgmt   Data  Access  

Data  Security  

1 Key Services Platform, Operational and Data services essential for the enterprise

Skills Leverage your existing skills: development, analytics, operations

2 Integration Interoperable with existing data center investments 3

Page 21: Enterprise Apache Hadoop: State of the Union

© Hortonworks Inc. 2014

1 Key Services Platform, Operational and Data services essential for the enterprise

Skills Leverage your existing skills: development, analytics, operations

2

What’s Needed for Enterprise Hadoop?

Page 21

OS/VM   Cloud   Appliance  

CORE    SERVICES  

   

CORE  

Enterprise Readiness High Availability, Disaster Recovery, Rolling Upgrades, Security and Snapshots

HORTONWORKS    DATA  PLATFORM  (HDP)  

OPERATIONAL  SERVICES  

DATA  SERVICES  

HDFS  

SQOOP  

FLUME  

NFS  

LOAD  &    EXTRACT  

WebHDFS  

KNOX*  

OOZIE  

AMBARI  

FALCON*  

YARN      

MAP       TEZ  REDUCE  

HIVE  &  HCATALOG  PIG  HBASE  

Integration Interoperable with existing data center investments 3

OPERATIONAL  SERVICES  

DATA  SERVICES  

CORE  SERVICES  

HORTONWORKS    DATA  PLATFORM  (HDP)  

Schedule  

Enterprise Readiness High Availability, Disaster Recovery, Rolling Upgrades, Security and Snapshots

Storage  

Resource  Management  

Process  

Data  Movement  

Cluster  Mgmnt   Dataset  

Mgmnt   Data  Access  

CORE  SERVICES  

HORTONWORKS    DATA  PLATFORM  (HDP)  

OPERATIONAL  SERVICES  

DATA  SERVICES  

HDFS  

SQOOP  

FLUME  AMBARI  FALCON  

YARN      

MAP       TEZ  REDUCE  

HIVE  PIG  HBASE  

OOZIE  

Enterprise Readiness High Availability, Disaster Recovery, Rolling Upgrades, Security and Snapshots

LOAD  &    EXTRACT  

WebHDFS  

NFS  

KNOX  

Page 22: Enterprise Apache Hadoop: State of the Union

© Hortonworks Inc. 2014

Hadoop 2 & Beyond

Page 22

details: hortonworks.com/labs

Page 23: Enterprise Apache Hadoop: State of the Union

© Hortonworks Inc. 2014

Hadoop 2: The Introduction of YARN

1st Gen of Hadoop

HDFS  (redundant,  reliable  storage)  

MapReduce  (cluster  resource  management  

 &  data  processing)  

Single Use System Batch Apps

Page 23

Store all data in one place, interact in multiple ways

Multi-Use Data Platform Batch, Interactive, Online, Streaming, …

Redundant,  Reliable  Storage  (HDFS)  

Efficient  Cluster  Resource    Management  &  Shared  Services  

(YARN)  

Flexible  Data  Processing  

Hive,  Pig,  others…  

Batch  MapReduce  

Batch  &  Interac4ve  Tez  

Online  Data    Processing  

HBase,  Accumulo  

Stream    Processing  

Storm  

 others  

…  

2nd Gen of Hadoop

Classic  Hadoop  Apps  

Page 24: Enterprise Apache Hadoop: State of the Union

© Hortonworks Inc. 2014

Apache Hadoop YARN

Page 24

Flexible Enables other purpose-built data processing models beyond MapReduce (batch), such as interactive and streaming

Efficient Double processing IN Hadoop on the same hardware while providing predictable performance & quality of service

Shared Provides a stable, reliable, secure foundation and shared operational services across multiple workloads

The Data Operating System for Hadoop 2

Data  Processing  Engines  Run  Na4vely  IN  Hadoop  BATCH  

MapReduce  INTERACTIVE  

Tez  STREAMING  

Storm  IN-­‐MEMORY  

Spark  OTHER  

Open  Source  /  Commercial  ONLINE  

HBase,  Accum  

HDFS:  Redundant,  Reliable  Storage  

YARN:  Cluster  Resource  Management      

Page 25: Enterprise Apache Hadoop: State of the Union

© Hortonworks Inc. 2014

Apache Tez: Modern Execution Engine

HDFS  (redundant,  reliable  storage)  

YARN  (cluster  resource  management)  

Tez  (execu@on  engine)  

Hive  (SQL)  

Pig  (data  flow)  

 OTHER  

Open  Source  /  Commercial    

MR  (batch)  

Supports BOTH Batch & Interactive workloads – Used for Stinger initiative to enable interactive SQL for Apache Hive – Hive and Pig will work on Tez – Other solutions are considering Tez

Apache Tez is a modern & more efficient alternative to MapReduce built on YARN

Page 25

Page 26: Enterprise Apache Hadoop: State of the Union

© Hortonworks Inc. 2014

Batch AND Interactive SQL-IN-Hadoop

Page 26

Value Delivered •  Enables rapid insight over big data

•  Single engine for batch & interactive

•  Preserves and transparently enhances existing investments in use of Hive

–  Ex. Hive-based solutions get 100x faster

•  SQL compliance improves integration with other data systems & tools

•  New ORCFile reduces storage up to 70% while improving resource use, scale, and throughput

Stinger Initiative Broad, community based effort to deliver the next generation of Apache Hive

Scale The only SQL interface to Hadoop designed for queries that scale from TB to PB

SQL Support broadest range of SQL semantics for analytic applications against Hadoop

Speed Improve Hive query performance by 100X to allow for interactive query times (seconds)

SQL

Apache Hive •  The defacto standard for Hadoop SQL access •  Used by your current data center partners •  Built for batch AND interactive query

Page 27: Enterprise Apache Hadoop: State of the Union

© Hortonworks Inc. 2014

Speed: Delivering Interactive Query

Page 27

Hive 10 Trunk (Phase 3) Hive 0.11 (Phase 1)

190x  Improvement  

1400s

39s

7.2s

TPC-­‐DS  Query  27  

3200s

65s

14.9s

TPC-­‐DS  Query  82  

200x  Improvement  

Query  27:  Pricing  Analy4cs  using  Star  Schema  Join    Query  82:  Inventory  Analy4cs  Joining  2  Large  Fact  Tables  

All  Results  at  Scale  Factor  200  (Approximately  200GB  Data)  

Page 28: Enterprise Apache Hadoop: State of the Union

© Hortonworks Inc. 2014

SCALE: Interactive Query at Petabyte Scale

Sustained Query Times Apache Hive 0.12 provides sustained acceptable query times even at petabyte scale

131  GB  (78%  Smaller)  

File  Size  Comparison  Across  Encoding  Methods  Dataset:  TPC-­‐DS  Scale  500  Dataset  

221  GB  (62%  Smaller)  

Encoded  with  Text  

Encoded  with  RCFile  

Encoded  with  ORCFile  

Encoded  with  Parquet  

505  GB  (14%  Smaller)  

585  GB  (Original  Size)   •  Larger Block Sizes

•  Columnar format arranges columns adjacent within the file for compression & fast access

Impala  

Hive  12  

Smaller Footprint Better encoding with ORCFile in Apache Hive 0.12 reduces resource requirements for your cluster

Page 28

Page 29: Enterprise Apache Hadoop: State of the Union

© Hortonworks Inc. 2014

SQL: Enhancing SQL Semantics

Hive  SQL  Datatypes   Hive  SQL  Seman4cs  INT   SELECT,  INSERT  

TINYINT/SMALLINT/BIGINT   GROUP  BY,  ORDER  BY,  SORT  BY  

BOOLEAN   JOIN  on  explicit  join  key  

FLOAT   Inner,  outer,  cross  and  semi  joins  

DOUBLE   Sub-­‐queries  in  FROM  clause  

STRING   ROLLUP  and  CUBE  

TIMESTAMP   UNION  

BINARY   Windowing  Func@ons  (OVER,  RANK,  etc)  

DECIMAL   Custom  Java  UDFs  

ARRAY,  MAP,  STRUCT,  UNION   Standard  Aggrega@on  (SUM,  AVG,  etc.)  

DATE   Advanced  UDFs  (ngram,  Xpath,  URL)    

VARCHAR   Sub-­‐queries  for  IN/NOT  IN,  HAVING  

CHAR   Expanded  JOIN  Syntax  

INTERSECT  /  EXCEPT  

Hive  0.12  (HDP  2.0)  

Available  

Hive  13  

SQL Compliance Hive 12 provides a wide array of SQL datatypes and semantics so your existing tools integrate more seamlessly with Hadoop

Page 29

Page 30: Enterprise Apache Hadoop: State of the Union

© Hortonworks Inc. 2014

Project  Phases  

Real-Time Streaming-IN-Hadoop

Apache Storm A community-based effort to bring real-time processing to Hadoop

Page 30

Storm  :  Improved  Mul4-­‐Tenancy  •  Declara@ve  “wiring”  •  Hive  update  support  •  Advanced  scheduler  

Storm  :  Enterprise  Connec4vity  •  No@fica@on  and  data  persistence  bolts:  EDWs,  RDBMS,  JMS  etc  

•  Data  Ingest  Spouts  •  AD/LDAP  plugin  for  authen@ca@on  •  High  Availability  management  w/Ambari  

Storm  :  Streaming  in  Hadoop  

•  Storm-­‐on-­‐YARN  •  Installa@on  with  Ambari  •  Ganglia  &  Nagios  based  monitoring  •  Kaia,  HBase,  HDFS  &  Cassandra  connectors  

HADOOP INTEGRATION Making streaming a first-class component of a modern data architecture

ENTERPRISE CONNECTIVITY Connecting Storm to the important streaming sources within the enterprise

IMPROVED MULTI-TENANCY Increasing operations usability and enabling simple programming of new flows

Goals:

Coming Soon

Page 31: Enterprise Apache Hadoop: State of the Union

© Hortonworks Inc. 2014

Hortonworks  Investment  in  Apache  Falcon  

Simplified Data Processing for Hadoop Apache Falcon Create and implement reusable workflows for datasets to orchestrate movement and track lineage

   Phase  3  

•  Advanced  Dashboard  for  pipeline  building  

•  Dataset  lineage    

Phase  1:  •  Incubate  Apache  Falcon  •  Dataset  Replica@on  •  Dataset  Reten@on  •  Falcon  Tech  Preview  

Phase  2:  

•  Hive  /  HCatalog  integra@on  •  Basic  Dashboard  for  En@ty  Viewing  •  Kerberos  security  support  •  Ambari  integra@on  for  management  

Acquisition & Processing Data •  Direct data to processing engines or formats •  Obfuscate or transform data

Replication & Retention Policy •  Replicate datasets •  Establish retention policies for datasets

Redirection & Extensions of Hadoop •  Redirect data to encrypt or decrypt •  Extract segments of data and redirect to other tools

Q4 2013

Coming Soon

Coming Soon

Page 31

Goals:

Page 32: Enterprise Apache Hadoop: State of the Union

© Hortonworks Inc. 2014

Enterprise Hadoop Security Today

Authorization Restrict access to explicit data

Audit Understand who did what

Data Protection Encrypt data at rest & motion

Kerberos in native Apache Hadoop Perimeter Security with Apache Knox Gateway

Native in Apache Hadoop •  MapReduce Access Control Lists •  HDFS Permissions •  Process Execution audit trail Cell level access control in Apache Accumulo

Wire encryption in native Apache Hadoop Orchestrated encryption with 3rd party tools

Authentication Who am I/prove it? Control access to cluster.

Page 32

Page 33: Enterprise Apache Hadoop: State of the Union

© Hortonworks Inc. 2014

Security  Investments  

Hadoop Security – What’s Next?

Security in Enterprise Hadoop Driving the next generation of Hadoop security

Page 33

Security  Phase  3:  •  Audit  event  correla@on  and  Audit  

viewer  •  NotOnlyKerberos  –  Support  other  

Token-­‐Based  Authen@ca@on  •  Data  Encryp@on  in  HDFS,  Hive  &  

HBase  

Security  Phase  1:  •  Strong  AuthN  with  Kerberos    •  HBase,  Hive,  HDFS  basic  AuthZ  •  Encryp@on  with  SSL  for  NN,  JT,  etc.  •  Wire  encryp@on  with  Shuffle,  HDFS,  

JDBC  

Security  Phase  2:  •  Knox:  Hadoop  Perimeter  Security  •  SQL-­‐style  Hive  AuthZ  (GRANT,  

REVOKE)  •  ACLs  for  HDFS  •  SSL  support  for  Hive  Server  2  •  PAM  support  for  Hive  

Flexible Authentication & Authorization Improve authentication choices and provide more granular access controls for the Hadoop platform, services and data.

Improve Data Protection Enhance Hadoop’s audit and data protection capabilities to support broader enterprise governance and compliance needs.

Work with Existing Systems Integrate with existing enterprise security and identity management systems in a consistent way.

Goals:

Delivered in HDP 2.0

Coming Soon

Page 34: Enterprise Apache Hadoop: State of the Union

© Hortonworks Inc. 2014

Operating Enterprise Hadoop at Scale

Apache Ambari is the only 100% open source framework for provisioning, managing and monitoring Apache Hadoop clusters

AMBARI  WEB      

Others  Viewpoint

compute &

storage . . .

. . .

. . compute &

storage

.

.

PROVISION

MANAGE

MONITOR

REST  APIs  

AMBARI  SERVER  PROVISION  |  MANAGE  |  MONITOR  

Integra@on  With  Exis@ng  Opera@ons  Tools   COMING SOON! Ambari Stacks: AMBARI-2714 Ambari Views: AMBARI-4234

Page 34

Page 35: Enterprise Apache Hadoop: State of the Union

© Hortonworks Inc. 2014

Recap

Page 35

• Hadoop's role is becoming clear • Major vendors have recognized Hadoop’s role and are actively integrating it into their solutions

• Adoption path is consistent: from apps to lake • Open source innovation continues unabated

– YARN opens up the platform, and as adoption deepens, the community of committers is working to mature it even further

Page 36: Enterprise Apache Hadoop: State of the Union

© Hortonworks Inc. 2014

Try Hadoop Today… Get Involved

Download the Hortonworks Sandbox

Page 36

Learn Hadoop

Build Your Analytic App

Try Hadoop 2

San Jose, CA June 3 - 5, 2014

CALL FOR

PAPERS OPEN

Amsterdam April 2 - 3, 2014

REGISTER NOW