Empower Data-Driven Organizations with HPE and HadoopGilles Noisette – HPE EMEA Big Data CoE
04/13/2016
Agenda
• A Data-driven world• HPE Contribution to Spark•HPE Innovations for Hadoop•Enterprise Grade SQL Analytics for Hadoop•Data-centric Security for Hadoop• HPE Data Discovery service
to help you pull together these innovations
Transformto a hybrid
infrastructure
Enableworkplace
productivity
Protectyour digitalenterprise
Empowerthe data-drivenorganization
Transformto a hybrid
infrastructure
Enableworkplace
productivity
Protectyour digitalenterprise
Empower the data-driven organizationHarness 100% of your relevant data to empower people with actionable insights that drive superior business outcomes.
Enterprise Spark at scaleHP Labs is helping make Apache Spark better
HPE and Hortonworks joint announcementHortonworks announcement event on March 1st
7
HPE CTO Martin Fink on stage
8
HPE Contribution to Apache SparkMartin Fink announcement
Hortonworks and HP Labs join forces to boost SparkHewlett Packard Labs is working with Hortonworks to enhance the efficiency and scale of memory for the enterprise and to dramatically improve memory utilization
– Enhanced shuffle engine technologies. Faster sorting and in-memory computations, which has the potential to dramatically improve Spark performance
– Better memory utilization. Improved performance and usage for broader scalability, which will help enable new large-scale use cases
“We're hoping to enable the Spark community to derive insight more rapidly, from much larger data sets, without having to change a single line of code”Martin Fink, CTO & Director HPLabs
Tested with customers from the Financial services industryProvides from 3x to 15x performance increases
HPE Innovations for HadoopOptimized Infrastructure and Architecture
10
HPE Servers and Architectures for Hadoop
Traditional
• Tried-and-True Platform• Corp standard: “I buy DL380’s”• Small to large deployments
(very often ~20 nodes)• Linear growth of balanced
workloads
Optimized
• Purpose-Built for Big Data• Mid-size to large deployments• Single, resource-intensive
workload• Workload optimized• Multi-temperate storage• “Optimized traditional”• Higher density, lower TCO
Converged
• MPP DBMS approach + open source• Mid-size to large deployments• Non-linear storage and
compute/memory growth• Multiple workloads, latency demands• Isolate workload hot spots • Scale compute and storage
separately, elastically• Innovative, TCO-driven approach
ProLiantDL380Gen9
UID
SATA7.2K
3.0 TB
SATA7.2K
3.0 TB
SATA7.2K
3.0 TB
SATA7.2K
3.0 TB
SATA7.2K
3.0 TB
SATA7.2K
3.0 TB
SATA7.2K
3.0 TB
SATA7.2K
3.0 TB
SATA7.2K
3.0 TB
SATA7.2K
3.0 TB
SATA7.2K
3.0 TB
SATA7.2K
3.0 TB
ProLiantDL 380Gen9
UID
SATA7.2K
3.0 TB
SATA7.2K
3.0 TB
SATA7.2K
3.0 TB
SATA7.2K
3.0 TB
SATA7.2K
3.0 TB
SATA7.2K
3.0 TB
SATA7.2K
3.0 TB
SATA7.2K
3.0 TB
SATA7.2K
3.0 TB
SATA7.2K
3.0 TB
SATA7.2K
3.0 TB
SATA7.2K
3.0 TB
UID UID
21
UID
Apollo4500Gen9
UID
Tray 222191613
24211815
10741
12963
Tray 1
Pull for tray 2Pull for tray 2
Apollo4200 Gen9
UID UID UID
21
UID
21
UID
21
UID
Apollo4500Gen9
10
9
8
7
6
14
13
12
11
19
18
17
16
15
24
23
22
21
20
5
4
3
2
1
UID
Ap ollo2000 System
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
ProLiantDL380Gen9
UID
SATA7.2K
3.0 TB
SATA7.2K
3.0 TB
SATA7.2K
3.0 TB
SATA7.2K
3.0 TB
SATA7.2K
3.0 TB
SATA7.2K
3.0 TB
SATA7.2K
3.0 TB
SATA7.2K
3.0 TB
SATA7.2K
3.0 TB
SATA7.2K
3.0 TB
SATA7.2K
3.0 TB
SATA7.2K
3.0 TB
ProLiantDL380Gen9
UID
SATA7.2K
3.0 TB
SATA7.2K
3.0 TB
SATA7.2K
3.0 TB
SATA7.2K
3.0 TB
SATA7.2K
3.0 TB
SATA7.2K
3.0 TB
SATA7.2K
3.0 TB
SATA7.2K
3.0 TB
SATA7.2K
3.0 TB
SATA7.2K
3.0 TB
SATA7.2K
3.0 TB
SATA7.2K
3.0 TB
Symmetric Architectures Asymmetric Architecture
Conventional Wisdom Forward-thinking
UID
28
30
29
31
33
21
34
36
35
37
39
38
40
42
41
43
45
44
1
3
2
4
6
5
7
9
8
10
12
11
13
15
14
16
18
17
19
21
20
22
24
23
25
27
26BA
Moonshot1500
DL380 Gen9Apollo 4xxx
Moonshot & Apollo
HPE Reference Architecture(s) for Hadoop
• Scaling from 4 to thousands of HPE Servers• Sized to customer’s workload and storage needs• Impressive Processor and Storage densityA set of pre-tested hardware components• Processor, Drives, Network, 1TB/8TB disk size etc ...
Breakthrough economics, density, simplicity
Flexible, pre-approved & optimized configurationsHPE Apollo 4000
example
24 x HPE ProLiant
Apollo 4530Worker Nodes
HPE 5900 10GbEHPE 5930 10GbE x 2 Network Switches
3 x DL360 Gen9Head Nodes
Apollo 4510
3.5 PB raw storage900 TB Hadoop usable
960 Xeon E5 coresfor a full rack
Apollo 4530
UID
ProLiantDL 380e
Gen8
SATA7.2K
2.0 TB
SATA7.2K
2.0 TB
SATA7.2K
2.0 TB
SATA7.2K
2.0 TB
SATA7.2K
2.0 TB
SATA7.2K
2.0 TB
SATA7.2K
2.0 TB
SATA7.2K
2.0 TB
SATA7.2K
2.0 TB
SATA7.2K
2 .0 TB
SATA7.2K
2 .0 TB
SATA7.2K
2 .0 TB
DL 380
2.46 PB raw storage630 TB Hadoop usable
756 Xeon E5 coresfor a full rack
UID
ProLiantDL380eGen8
SATA7.2K
2.0 TB
SATA7.2K
2.0 TB
SATA7.2K
2.0 TB
SATA7.2K
2.0 TB
SATA7.2K
2.0 TB
SATA7.2K
2.0 TB
SATA7.2K
2.0 TB
SATA7.2K
2.0 TB
SATA7.2K
2.0 TB
SATA7.2K
2.0 TB
SATA7.2K
2.0 TB
SATA7.2K
2.0 TB
Apollo 4200
4.6 PB raw storage1 PB Hadoop usable756 Xeon E5 cores
for a full rack
UID
10 134 71
11 145 82
12 156 93
UID
10 134 71
11 145 82
12 156 93
UID
10 134 71
11 145 82
12 156 93
UID UID UID
ProLiantSL4540Gen8
SATA7.2K
500 GB
SATA7.2K
500 GB
SATA7.2K
500 GB
SATA7.2K
500 GB
SATA7.2K
500 GB
SATA7.2K
500 GB
5.3 PB raw storage1.3 PB Hadoop usable
320 Xeon E3 coresfor a full rack
HPE Apollo 4200 - Bringing Big Data storage server density to enterpriseUsed as standard Hadoop Worker node and BDRA Asymmetric Storage node
Storage density28 LFF Data drives
DataCenter Plug and play
Performance and efficiency
Divide by 2 the number of serverDivide by 2 the number of Network portsDivide by 2 the needed square metersLower the number of needed licenses/subscriptions
Highest storage density in a traditional 2U rack server - 224 TB up to 4.6PB / rackPerfect core/spindle ratio of 1 with 28 cores (2 x 14) and 28 drive spindles
Enterprise bridgeFits traditional enterprise/SME rack server data centersLower the electric power needs
Configuration flexibilityBalanced capacity, performance and throughput with flexible options - Disks, CPUs , I/O and interconnects
14
Hadoop on HPE MoonshotWhat would be a good server cartridge for Hadoop ?
Processing– Number of Xeon cores : 8– very efficient I/Os
Memory– Memory : 128GB
Storage– Data storage : 2TB m.2 (SSD)
Network– Fast network (2 x 10GbE)– Low latency chassis interconnect
ImpalaSQL on Hadoop
45 x 128GB = 5.6TB RAM - 45 x 2TB = 90TB fast Data storage in 4U
45 servers per enclosure
HPE Asymmetric Architecture for HadoopHPE Vertica SQL on HadoopEnterprise-Grade Hadoop
15
17
HPE Big Data Reference ArchitectureHPE Brings Enterprise Data Center Architecture to Hadoop
Traditional Hadoop Cluster Architecture– Compute and storage are always co-located
– All servers are identical
– Data is partitioned across servers on direct attached storage
HPE Big Data Reference Architecture– Separate, optimized compute and storage tiers
connected by high speed networking
– Standard Hadoop installed with storage components on the storage servers and applications on the compute servers
– Enabled and optimized by purpose-selected HPE Moonshot and Apollo servers and HPE/Hortonworks workload management software (contributed to the community)
Servers
Applications, data files
Compute Servers
Storage Servers
Applications, intermediate data
Data files
Symmetric architecture
Asymmetric architecture
18
10
9
8
7
6
14
13
12
11
19
18
17
16
15
24
23
22
21
20
5
4
3
2
1
UID
Apollo2000 Syst em
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
10
9
8
7
6
14
13
12
11
19
18
17
16
15
24
23
22
21
20
5
4
3
2
1
UID
Apollo2000 Syst em
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
10
9
8
7
6
14
13
12
11
19
18
17
16
15
24
23
22
21
20
5
4
3
2
1
UID
Apollo2000 Syst em
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
10
9
8
7
6
14
13
12
11
19
18
17
16
15
24
23
22
21
20
5
4
3
2
1
UID
Apollo2000 Syst em
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
SAS900 GB
10K
Benefits of HPE Big Data Reference Architecture for HadoopDelivering value to the business
High Speed Network
Data ConsolidationHosting Multiple Workloads
Maximum Elasticity and Workload Isolation
Balance and Scale Compute and Storage Independently
Breakthrough Density and TCO
HPE Moonshot or HPE Apollo
HPE Apollo 4xx0
Advantages* of HPE Big Data Reference ArchitectureRoom to Grow - The same performance in half the space
19
* Normalized on performance, based on Terasort testingHPE Big Data
Reference ArchitectureTraditional Architecture
Traditional Big Data Architecture
HPE Big Data Reference Architecture
Hadoop performance Equivalent
Density >2x more dense
Network bandwidth 40Gbit versus 10Gbit
HDFS Storage performance
2x greater
Power (watts) Half the power
Independent scaling of compute and storageGrow to match your workload and data sources
20
Hot (Compute) Configuration Cold (Storage) Configuration
HPE Big Data Reference ArchitectureTraditional Architecture
2.8x compute97% of the storage capacity4x the memory
1.6x compute1.5x the storage capacity2.5x the memory
90% of the compute2.1x the storage capacity1.5x the memory
HPE Big Data Reference ArchitectureHadoop and its ecosystem take advantage of the BDRA
Network SwitchesEast - West Networking
Impala
SSD based Hard Disk based Archive
High Speed Network
Enterprise Grade SQL Analytics for Hadoop
• Develop your own analytical applications with full-functionality ANSI SQL
• Vertica Inside - Powerful and Proven SQL Query Engine
• Installs in Hadoop cluster, supporting Ambari, YARN-ready
• Enterprise-Ready, Stable with full ANSI SQL capabilities, Predictive analytics
HPE Vertica SQL on Hadoop
YARN Apps
HDFS, ORC, Parquet
Compute optimized Servers
Storage optimized Servers
SQL on Hadoop
First commercially available columnar database
Native Advanced Analytics to deliver insight at the speed of business
Native Hadoop Integration
SaaS and AMI Cloud options
Support for new open source architectures includingKafka and Spark.
Core Vertica SQL EngineAdvanced Analytics
Open ANSI SQL Standards ++ R, Python, Java, ScalaCore is Key
Same core Vertica engine delivers advanced analytics wherever your enterprise needs demand — today and tomorrow.
HP Vertica forSQL on HadoopNative support for ORC, ParquetSupports all distributionsNo helper node or singlepoint of failure
HP VerticaEnterprise EditionColumnar storage and advanced compressionIndustry leading performance and scalabilityVertica Community Edition Free up to 1 TB
Build a data-centric foundationHPE Vertica Advanced Analytics Family– with enterprise-grade reliability and scalability
HP Vertica OnDemandGet up and running in < 1HRPay by the TB or Query
HP Vertica AMIHundreds of TB deployedBring your own license to Amazon Web Services
HPE Big Data Architecture long term viewEvolve to support multiple compute and storage blocks
Low Cost Nodes
SSD Nodes Disk Nodes Archive Nodes
Multi-temperate Storage using HDFS Tiering and ObjectStores
GPU Nodes FPGA Nodes Big Memory Nodes
Workload Optimized compute nodes to accelerate various big data software
Data-centric security for HadoopEnterprise-Grade Hadoop
25
HPE SecureData provides the missing data protection
26
Traditional IT Infrastructure Security
Disk encryption
Database encryption
SSL/TLS/firewalls
AuthenticationManagement
Threats toData
Malware,Insiders
SQL injection,Malware
TrafficInterceptors
Malware,Insiders
CredentialCompromise
Security Gaps
HPE SecureData Data-centric Security
SSL/TLS/firewalls
Dat
a se
curit
y co
vera
ge
End-
to-e
nd P
rote
ctio
n Middleware/Network
Storage
Databases
File Systems
Data & Applications
DataEcosystem
Security gap
Security gap
Security gap
Security gap
HPE SecureDataProtecting sensitive and regulated data in Hadoop
– Stateless Key Management– No key database to store or manage– High performance, unlimited scalability
– Both encryption and tokenization technologies– Customize solution to meet exact requirements
– Broad platform support – On-premise / Cloud / Big Data– Structured / Unstructured– Hadoop, HPE Vertica, Linux, Windows, AWS, HPE NonStop,
Teradata, IBM z/OS, etc.
– Quick time-to-value– Complete end-to-end protection within a common platform– Format-preservation dramatically reduces implementation effort
27
HPE SecureData Management Console
HPE SecureData Web Services API
HPE SecureDataNative APIs
(C, Java, C#./NET)HPE SecureData Command Lines
HPE SecureDataKey Servers
HPE SecureData File Processor
28
Field level, format-preserving, reversible data de-identificationCustomizable to granular requirements addressed by encryption & tokenization
Credit card1234 5678 8765 4321
SSN/ID934-72-2356
Email [email protected]
DOB31-07-1966
Full 8736 5533 4678 9453 347-98-8309 [email protected] 20-05-1972
Partial 1234 5681 5310 4321 634-34-2356 [email protected] 20-05-1972
Obvious 1234 56AZ UYTZ 4321 AZS-UD-2356 [email protected] 20-05-1972
FPE**SST*
*Secure Stateless Tokenization (SST)**Format-Preserving Encryption (FPE)
Data Discovery serviceDiscover the value of your Data
29
Align business goals and challenges with the relevant data
How to discover the value of your data
Evaluate your data and quickly test, learn, and iterate ideas to discover value
Create a strategic roadmap based on learnings
Key HPE solutionsData Discovery
Data Driven Transformation Planning
Business benefitsAgile execution to impactful projects
Maximize alignment to value
• To help you with your journey, HPE Data Discovery Solution provides an end-to-end approach to realizing the value of your data
• Includes experienced consultants, proven processes, modern big data analytics platforms and infrastructure, and convenient delivery options.
• Empowers you to realize:• Clear path to business insights and value• Rapid exploration and real-time access• Lower risk• Lower costs
Business value metrics• Improve business processes• Enable better operations performance• Understand customer better• Increase market share, margin, and/or revenue
Business Value HPE Data Discovery Solution Framework
Discovery Workshop
HPE Vertica, HPE IDOL, Hadoop, SAP HANA
Premises Cloud
Discovery Experience
Discovery Production Implementation
Discovery Lab
HPE Servers and Storage
Rapid, low-risk, securely designed path to big data value delivered as-a-service in the HPE Cloud or on Client premises
ExpertiseHPE data scientists, technology
experts, industry SMEs
Big data platforms
HPE Haven, Hadoop, SAP HANA, etc.
Platform flexibility
On premise or cloud-based
delivery models
Guided processProven processes to accelerate time-
to-value
Use case libraryIndustry and
business function examples
Discovery Production ImplementationOperationalize and monetize the new insights by implementing them into your business processes
Discovery Workshop One to two-day workshop to align business and IT, discuss opportunities and determine priorities
Discovery Experience A private, secure and low risk big data “test-drive” functional and technical environment
HPE Data Discovery Service
Big data infrastructureHPE Moonshot,
HPE Apollo, HPE 3PAR, HPE
ProLiant
Data discovery lab
Rapid deployment of data discovery
labs
Summary
35
36
HPE Solution for HadoopBi
g Da
ta
Anal
ytics
RA
HPE Vertica SQL for Hadoop SAP HANA HPE IDOL
Hadoop Reference Architectures for MapR, Hortonworks & Cloudera
HPE Information Governance
HadoopHPE Apollo + Moonshot + ProLiant
HPE Analytics Consulting Services for Hadoop HPE Integration Services
On-Premise and Hybrid Cloud deployment options
Flexible, Purpose-built Infrastructure
High-Performing Analytics Engines
Consulting & Implementation Services
High performance computing
2x Hadoop performanceor 50% less space
HPE Infrastructure Big Data Reference Architecture
Analyze at scale and speed
100% of your data10x to 1,000x faster
HPE Big Data platformPowered by Vertica & IDOL
Secure and govern
Protect and manageyour data and reputation
HPE Security and GovernanceSolutions for Hadoop
Data management, data discovery and governance services
Build a Data Centric FoundationHadoop for the Enterprise
Why Hewlett Packard Enterprise?Enterprise Scale with Hadoop
Solution leadership Market leadership Flexible and OpenExperience and expertise
3000+ global analytics and data management professionals
Hundreds of data scientists
Proven analytics and compute platforms for all data, environments, and analytics
Services to deliver value from discovery to achieving business outcomes
Gartner’s Magic Quadrant leader for:
— Enterprise Data Warehouse and Data Management Solutions for Analytics (2015)
— eDiscovery (2015)
Solutions built on open-standards, offering choice and flexibility
Strong strategic alliances complementing HPE solutions
THANK YOU
39