Hadoop Present - Open Enterprise Hadoop
Preview:
Citation preview
- 1. Hortonworks Inc. 2011 2015. All Rights Reserved Hadoop
Present Open Enterprise Hadoop Yifeng Jiang Solutions Engineer,
Hortonworks, inc. July 26, 2015
- 2. Hortonworks Inc. 2011 2015. All Rights Reserved (Yifeng
Jiang) Solutions Engineer @ Hortonworks Japan HBase book author
Twitter: @uprush
- 3. Hortonworks Inc. 2011 2015. All Rights Reserved Ageda Hadoop
Core Updates Data Access in Hadoop Hadoop Security Hadoop
Management
- 4. Hortonworks Inc. 2011 2015. All Rights Reserved Hadoop
Present Enterprise Ready Hadoop
- 5. Hortonworks Inc. 2011 2015. All Rights Reserved Hadoop
Number of Issues Resolved Number of Line of Code Increased
http://ajisakaa.blogspot.jp
- 6. Hortonworks Inc. 2011 2015. All Rights Reserved Open
Leadership Code Contributed in 2014 by Organization
Hortonworks
- 7. Hortonworks Inc. 2011 2015. All Rights Reserved : 20116:
Yahoo! Hadoop 24 201412: 600Hadoop Apache Project Committers PMC
Members Hadoop 27 21 Pig 5 5 Hive 18 6 Tez 16 15 HBase 6 4 Phoenix
4 4 Accumulo 2 2 Storm 3 2 Slider 11 11 Falcon 5 3 Flume 1 1 Sqoop
1 1 Ambari 36 28 Oozie 3 2 Zookeeper 2 1 Knox 13 3 Ranger 11 n/a
TOTAL 164 109
- 8. Hortonworks Inc. 2011 2015. All Rights Reserved Hortonworks
Data Platform 2.2 Stack
- 9. Hortonworks Inc. 2011 2015. All Rights Reserved Hadoop Core
HDFS + YARN: Data Operating System
- 10. Hortonworks Inc. 2011 2015. All Rights Reserved HDFS
Scalable & Efficient Data Lake Storage
- 11. Hortonworks Inc. 2011 2015. All Rights Reserved HDFS: more
Efficient Data Lake Storage HDFS NFS Gateway Mount HDFS path
Erasure Coding (under dev) Reduce storage cost from 3x to 1.4x
Tiered Storage DataNode becomes collection of tiered storages DISK,
SSD, RAM, ARCHIVAL
- 12. Hortonworks Inc. 2011 2015. All Rights Reserved Storage
Growth Challenges Some cluster storage need grows very fast High
volumes of data More users and new use cases to Hadoop Only way to
grow storage is add more nodes Page 12Architecting the Future of
Big Data Cluster Storage and Compute Capacity Cluster Storage
Utilization Compute Utilization
- 13. Hortonworks Inc. 2011 2015. All Rights Reserved Archival
Storage Scenario Data Usage Hot - Less than 7 days with very high
usage Warm Less than 1 month and used ~20 times per month Cold Less
than 3 months and used 5 times per month Frozen - 3 months to 7
years and used approximately 2 times per year Ebay 0.00 5.00 10.00
15.00 20.00 25.00 30.00 35.00 40.00 0 10 20 30 40 50 60 70 80
Temperature of Data Hadoop TIME (Data Age)
FrequencyofDataUsage(perMonth) Cold Data Hot Data Warm Data Cold
Data
- 14. Hortonworks Inc. 2011 2015. All Rights Reserved Archival
Storage for Cost Efficiency Scale Storage independently from
Compute. Archival Storage Tier Deploy storage dense hardware nodes
Utilize storage policies for datasets: Hot, Warm, Cold Achieve ~4x
lower price point per GB Cluster Storage Capacity Cluster Storage
Utilization Compute Utilization Cluster Compute Capacity
- 15. Hortonworks Inc. 2011 2015. All Rights Reserved HDFS
Storage Architecture - Before
- 16. Hortonworks Inc. 2011 2015. All Rights Reserved HDFS
Storage Architecture - Now
- 17. Hortonworks Inc. 2011 2015. All Rights Reserved Storage
Policy: SSD & Hot SSD SSD SSD SSD SSD SSD SSD SSD SSD DISK DISK
DISK DISK DISK DISK HDP Cluster A DISK DISK DISK A A SSD All
replicas on SSDDataSet A (e.g., HBase) Hot All replicas on DISK
DataSet B (others) B B B I2.8x I2.8x I2.8x d2.8x d2.8x d2.8x
- 18. Hortonworks Inc. 2011 2015. All Rights Reserved Storage
Policy: AmbariHDFS Conguration Groups I2 D2 AmbariGroupsDataNode
dfs.datanode.data.dir I2 group:
[SSD]/hadoop/hdfs/data1,[SSD]/hadoop/hdfs/data2, D2 group:
[DISK]/hadoop/hdfs/data1,[DISK]/hadoop/hdfs/data2, HDFS
- 19. Hortonworks Inc. 2011 2015. All Rights Reserved Storage
Policy $ hdfs dfs -mkdir /hbase$ hdfs dfsadmin -setStoragePolicy
/hbase ALL_SSD Set storage policy ALL_SSD on /hbase$ hdfs dfsadmin
-getStoragePolicy /ssd The storage policy of /ssd:
BlockStoragePolicy{ALL_SSD:12, storageTypes=[SSD],
creationFallbacks=[DISK],replicationFallbacks=[DISK]} HBaseSSDi2
/hbase ALL_SSD
- 20. Hortonworks Inc. 2011 2015. All Rights Reserved HDFS: Next
Step Erasure Code GA Ozone: an object store in HDFS HDFS-7285
HDFS-7240
- 21. Hortonworks Inc. 2011 2015. All Rights Reserved YARN
Extends Hadoop into Data OS
- 22. Hortonworks Inc. 2011 2015. All Rights Reserved Recap:
Whats YARN Cluster Resource Management Resource sharing Capacity
scheduler Fair Sharing: pluggable queue policies new Isolation
Memory, CPU Node labels new Workload types Batch, interactive,
in-memory
- 23. Hortonworks Inc. 2011 2015. All Rights Reserved Storm Storm
StormStorm Exclusive Node Labels enable Isolated Partitions S App
Storm Configure Partitions Storm B App Exclusive Labels enforce
Isolation S S nodes labels S S HDP 2.2
- 24. Hortonworks Inc. 2011 2015. All Rights Reserved Spark Spark
SparkSpark Non-Exclusive Node Labels S App Spark Configure non-
exclusive labels Spark B App Schedule if free capacity S S nodes
labels S S B YARN-3214 HDP 2.3
- 25. Hortonworks Inc. 2011 2015. All Rights Reserved Working
with Labels Ambari YARN Guided Configuration: Enable node labels
YARN CLI: Create and assign labels ResourceManager UI: View Node
Labels in Cluster Capacity Scheduler View: Define workload
management policy with labels $ yarn rmadmin
-addToClusterNodeLabels spark(exclusive=false) $ yarn cluster
-list-node-labels $ yarn rmadmin -replaceLabelsOnNode
node5=spark
- 26. Hortonworks Inc. 2011 2015. All Rights Reserved YARN: Next
Step Disk & network isolation Just isolation enforce equal
sharing of Disk and Network I/O across containers running on node
Current in technical preview of HDP 2.3 Disk resource: Local Disk
Iops not HDFS read/writes Network resource: Outbound only bandwidth
(mbits/sec) YARN-2619 YARN-2140
- 27. Hortonworks Inc. 2011 2015. All Rights Reserved Data Access
Innovation SQL, Spark, Stream Processing, Search
- 28. Hortonworks Inc. 2011 2015. All Rights Reserved Hive:
Enterprise SQL at Hadoop Scale Native transactions Delivered:
Insert, Update, Delete Performance: 100x faster ORC File Hive on
Tez Cost Based Optimizer Vertorized SQL engine 28
- 29. Hortonworks Inc. 2011 2015. All Rights Reserved Hive: Next
Step SQL Enhancement Transactions: BEGIN, COMMIT, ROLLBACK SQL 2011
Analytics Performance Sub-second response: LLAP, HBase as
metastore, etc. Apache Hive
- 30. Hortonworks Inc. 2011 2015. All Rights Reserved Spark
Features HDP 2.3.x & Spark 1.3.1 Supported Spark Core MLlib
Spark on YARN Kerberos Ambari support Tech Preview SparkSQL* Spark
Streaming DataFrame Spark ML Pipeline API Unsupported GraphX
BlinkDB Spark Standalone/ Mesos
- 31. Hortonworks Inc. 2011 2015. All Rights Reserved Resource
Management YARN for multi-tenant, diverse workloads with
predictable SLAs Tiered Memory Storage HDFS in-memory tier External
BlockStore for RDD Cache SparkSQL & Hive for SQL Interop with
modern Metastore/HS2, optimized ORC support, advanced analytics
e.g. Geospatial Spark & NoSQL Deep integration with HBase via
DataSources/Catalyst for Predicate/Aggregate Pushdown Connect The
Dots Algorithms to Use-Cases Higher-level ML Abstractions E.g.
OneVsRest Validation, tuning, pipeline assembly... e.g. GeoSpatial
Spark and Hadoop How Can We Do Better? Storage YARN: Data Operating
System Governance Security Operations Resource Management
- 32. Hortonworks Inc. 2011 2015. All Rights Reserved Ease of Use
Apache Zeppelin for interactive notebooks Metadata & Governance
Apache Atlas for metadata & Apache Falcon support for Spark
pipelines Security & Operations Apache Ranger managed
authorization and deployment/ management via Apache Ambari
Deployable Anywhere Linux, Windows, on-premises or cloud
Self-Service Spark in the Cloud Easy launch of Data Science
clusters via Cloudbreak and Ambari for Azure, AWS, GCP, OpenStack,
Docker Spark and Hadoop How Can We Do Better? Storage YARN: Data
Operating System Governance Security Operations Resource
Management
- 33. Hortonworks Inc. 2011 2015. All Rights Reserved Platform
Innovation for Data Access An integrated scalable platform for data
access powered by HDP Limitless storage Deep analytics Real-time
access
- 34. Hortonworks Inc. 2011 2015. All Rights Reserved Security
End to End Security in Hadoop
- 35. Hortonworks Inc. 2011 2015. All Rights Reserved Five
Security Requirements Authentication Kerberos Authorization Audit
Encryption HDP 2.3 Security support RANGER HDFS Hadoop Security
Overview
- 36. Hortonworks Inc. 2011 2015. All Rights Reserved HDFS Fully
Secure Flow End to End Security HiveServer 2 A B C KDC Use Hive ST,
submit query Hive gets Namenode (NN) service ticket 6.Hive creates
map reduce using NN ST Ranger 3.Knox gets service ticket for Hive
4.Knox calls as proxy user 1.Original request w/user id/password
Client gets query result SSL O/JDBC Client SSL SASL SSL SSL SSL
LDAP 2.Knox Authenticates user/pass Ranger Sync users/groups from
LDAP 5. Ranger AuthZ Apache Knox Apache Knox
- 37. Hortonworks Inc. 2011 2015. All Rights Reserved Ranger:
Central Security Administration 37 Table/column access control
Audit logging Flexible definition Control group/ user
permissions
- 38. Hortonworks Inc. 2011 2015. All Rights Reserved Hadoop
Management Ambari: Hadoop for Everyone, 100% Open Source
- 39. Hortonworks Inc. 2011 2015. All Rights Reserved Whats
Apache Ambari? 100% open source operational platform to provision,
manage and monitor Hadoop clusters
- 40. Hortonworks Inc. 2011 2015. All Rights Reserved Apache
Ambari Mission Easyopera,onat scale
Largescaleclusterinstall,manageandmonitor Ecientandscaleatscale
Easytoextendwith community Innovatewithcommunity
Integratewithenterpriseso:ware Acceleratenewfeatureandadop=on
Centralized managementfor thewholeHadoop stack
AccesspointforallHadoopusers,notjustclustermanagement
Easyofuse
- 41. Hortonworks Inc. 2011 2015. All Rights Reserved Ambari 2.1
HDP Stack High Availability HDP Stack Mode Ambari 2.0 Ambari 2.1
HDFS: NameNode HDP 2.0+ Active/ Standby YARN: ResourceManager HDP
2.1+ Active/ Standby HBase: HBaseMaster HDP 2.1+ Multi-master Hive:
HiveServer2 HDP 2.1+ Multi-instance Hive: Hive Metastore HDP 2.1+
Multi-instance Hive: WebHCat Server HDP 2.1+ Multi-instance Oozie:
Oozie Server HDP 2.1+ Multi-instance Storm: Nimbus Server HDP 2.3
Multi-instance Ranger: AdminServer HDP 2.3 Multi-instance
- 42. Hortonworks Inc. 2011 2015. All Rights Reserved Install
Wizard
- 43. Hortonworks Inc. 2011 2015. All Rights Reserved Guided
Configs for HDFS
- 44. Hortonworks Inc. 2011 2015. All Rights Reserved Guided
Configs for YARN & MapReduce
- 45. Hortonworks Inc. 2011 2015. All Rights Reserved Enable
Features in YARN
- 46. Hortonworks Inc. 2011 2015. All Rights Reserved Cluster
Dashboard
- 47. Hortonworks Inc. 2011 2015. All Rights Reserved Service
Dashboard
- 48. Hortonworks Inc. 2011 2015. All Rights Reserved Service
Manage - HDFS
- 49. Hortonworks Inc. 2011 2015. All Rights Reserved Host
Manage
- 50. Hortonworks Inc. 2011 2015. All Rights Reserved Monitor
& Alert Email SNMP Notifications Script new
- 51. Hortonworks Inc. 2011 2015. All Rights Reserved User Views
HDFS File View Files View Browse HDFS file system.
- 52. Hortonworks Inc. 2011 2015. All Rights Reserved User Views
YARN CS, Tez Capacity Scheduler View Browse + manage YARN queues
Tez View View information related to Tez jobs that are executing on
the cluster.
- 53. Hortonworks Inc. 2011 2015. All Rights Reserved User Views
Pig, Hive Pig View Author and execute Pig Scripts. Hive View
Author, execute and debug Hive queries.
- 54. Hortonworks Inc. 2011 2015. All Rights Reserved
Summary
- 55. Hortonworks Inc. 2011 2015. All Rights Reserved Open
Enterprise Hadoop Hadoop/YARN-powered data operating system 100%
open source, multi-tenant data platform for any application, any
data set, anywhere. Built on a centralized architecture of shared
enterprise services Scalable tiered storage Resource and workload
management Trusted data governance & metadata management
Consistent operations Comprehensive security Developer APIs and
tools YARN: data operating system Governance Security Operations
Resource management Data access: batch, interactive, real-time
Storage Commodity Appliance Cloud
- 56. Hortonworks Inc. 2011 2015. All Rights Reserved Thank you
Yifeng Jiang, Solutions Engineer, Hortonworks @uprush