Upload
docong
View
223
Download
3
Embed Size (px)
Citation preview
Best Practices for Hadoop Performance
with
Hortonworks Data Platform
on
OpenPOWER Systems
Agenda
• Hortonworks and IBM Partnership
• Open Community Innovation
• Leading Time to Insights
• The Fast Lane to Machine Learning
2
Predictloan defaults
Personalizecustomer journeys
Preventfraud
Preventnon-compliance
The AI Opportunity
Data Lake(relational and
new data stores)
Cognitive
Human Intelligence Exhibited by
Machines
Advanced Analytics
(Augmented with AI Methods)
Machine Learning
“Trained” using large amounts of data & ability to
learn how to perform the task
Deep Learning
Break tasks into Deep Neural
Networks
Hortonworks and IBM Collaboration
4
DATAWORKS SUMMIT/HADOOP SUMMIT MUNICH, Germany, April 4, 2017 — Hortonworks, Inc. (NASDAQ: HDP), a
leading innovator of open and connected data platforms, today announced the general availability of Hortonworks Data
Platform (HDP®) version 2.6.
HDP 2.6 is also available on IBM Power Systems. The work Hortonworks and IBM are doing together, including their support
for ODPi, gives customers increased choice when selecting a top-tier distribution for Hadoop and Spark and enables them
to fully exploit the performance, scalability and acceleration capabilities of the POWER8 platform
Scott Gnau, CTO, Hortonworks at Edge.
Youtube: http://bit.ly/2dSOliW
James Wade, Director of Application Hosting, Florida Blue.
Youtube: http://bit.ly/2dxVHIY
© 2016 IBM Corporation
Customer Story – Guidewell Health - Florida Blue
• Business Problem
– Transformational journey resulting in rapid expansion of business models
– Technology innovation required to keep up with the business expansion while improving
client satisfaction, reducing costs and supporting the company’s green IT initiatives
o Existing x86 server sprawl not sustainable
• Solution with Hortonworks, IBM OpenPOWER servers and Sage Solutions Consulting
– Embraces the open software and hardware model adopted by Florida Blue
– Hortonworks supporting new fraud analytics initiative to reduce costs and client premiums
– OpenPOWER to enable smaller datacenter footprint with stronger reliability
5
See the full story in this Hortonworks Blog post.
© 2016 IBM Corporation
- Design and cost optimized for deployments of multiples (cloud and cluster)
- Broad number of optimal solutions
- Co-Designed with the OpenPOWER Ecosystem
Supported by Canonical
IBM Support
Community / 3rd
Party Support
running
The LC Line
The L Line
PurePower
Enterprise& IFLs
- Enterprise level RAS for single system deployments
- Solutions for Big Data & Analytics
- Converged infrastructure offering
- Rapid time to value and simplicity of management
- Enterprise level robustness and IFL capability
- Solution editions for in memory databases
- (HANA, DB2 BLU)
- Hosted cloud and hybrid cloud solutions
- Rapid deployments and POCs
The IBM Power Systems Linux Portfolio
Pipeline of innovation
Broad Linux portfolio
delivers all your Linux
deployment needs
POWER8 is designed for the Big Data era and
delivers price-performance leadership to the
Linux Market!
65© 2016 IBM Corporation
The OpenPOWER Foundation
7
The OpenPOWER Foundation
is an open ecosystem,
using the
POWER Architecture to serve
the evolving needs of
customers.
• Moore’s law no longer
satisfies performance gain
• Growing workload demands
• Numerous IT consumption
models
• Mature Open software
ecosystem
Open Development
open software, open hardware
Collaboration of thought leaders
simultaneous innovation, multiple disciplines
• Rich software ecosystem
• Spectrum of power servers
• Multiple hardware options
• Derivative POWER chips
Market Shifts New Open Innovation
Performance of POWER architecture
amplified capability
AccelerationTechnology FAB
I/O Networking
Storage
FW Open
Source SYS ODM
OEM
SW Linux
ISV
Open Source
Chip
SoC Dev
IP Dev Technology
FAB
I/O Networking
Storage
FW Open
Source SYS ODM
OEM
SW Linux
ISV
Open Source
Chip
SoC Dev
IP Dev
WEB 2.0 Data Center
MSP
Cloud
Members And growing ….
120+ 300+
© 2016 IBM Corporation
Innovation Pervasive in the Design
Power Systems S822LC for Big Data
Not Just Another Intel Server
NVIDIA:
Tesla K80 GPU Accelerator
Linux by Redhat:
Redhat 7.2 Linux OS
Mellanox: InfiniBand/Ethernet
Connectivity in and out of server
HGST: Optional NVMe Adapters
Alpha Data with Xilinx FPGA:
Optional CAPI Accelerator
Broadcom: Optional PCIe Adapters
QLogic: Optional Fiber Channel PCIe
Samsung: SSDs & NVMe
Hynix, Samsung, Micron: DDR4
IBM: POWER8 CPU
8
4XThreads per core
4X Mem. Bandwidth1
6XMore cache2 @
Lower Latency
SMT=Simultaneous Multi-Threading
OLTP = On-Line Transaction Processing
These design decisions result in best performance for data centric workloads like:
Spark, Hadoop, Database, NoSQL, Big Data Analytics, OLTP
POWER8: Designed for data to deliver breakthrough performance
POWER8
SMT8
x86
Hyperthread
Parallel Processing
POWER8pipe
Data flow
x86 pipe POWER8
x86 POWER8 +
OpenPOWER
x86
9
1. Up to 4X depending on specific x86 and POWER8 servers being compared
2. Up to 6X more cache comparing Intel e7-8890 servers to 12 core POWER8 servers. See speaker notes for more details
© 2016 IBM Corporation
Flexibility with HDP on Power Systems
• Scale Up or Out to Meet Evolving Workloads
– Scale up each node by exploiting the memory bandwidth and multi-threading
– 4X threads per core vs x86 allows you to optimize and drive more workload per node
– Offering 4X memory bandwidth vs x86, POWER8 gives you more options as your workloads expand
and evolve
• Unmatched Range of Linux Servers
– From 1U, 16-core servers up to 16 socket, 192 core powerhouses with industry leading reliability all
running standard Linux
– Virtualization options to host low cost dev environments or rich, multi-tenant private clouds
– Wide range of OpenPOWER servers offered by OpenPower members for on-prem and the cloud
• Accelerated Analytics
– Add accelerators (flash, GPU, FPGA) with direct access to processor memory with OpenCAPI
10
IBM Power S822LC for Big Data and Hortonworks
Combine to Deliver Leadership in Hadoop Environments
• Performance results are based on preliminary IBM Internal Testing of 10 queries (simple, medium, and complex) with varying runtimes running against a 10TB database. The tests were run on 10 x IBM Power System S822LC for Big Data
20 cores / 40 threads, 2 X POWER8 2.92GHz, 256 GB memory, RHEL 7.2,, HDP 2.5.3 compared to the published x86/Hortonworks results running on 10 x AWS d2.8xlarge EC2 nodes running HDP 2.5; details can be found at
https://hortonworks.com/blog/apache-hive-going-memory-computing/ . Conducted under laboratory condition, individual result can vary based on workload size, use of storage subsystems
& other conditions. Data as of February 28, 2017)
• POWER8 and Hortonworks deliver 1.70X the throughput
compared to Hortonworks running on x86
– 70% More QpH based on the average response time –
complete the same amount of work with less system
resources
– 41% Reduction on average in query response time –
reduced response time enables making business
decisions faster.
70%More
Throughput
TCO at Scale with HDP on Power Systems
12
• Up to 3X reduction of storage and compute infrastructure moving to Power Systems and Elastic
Storage Server vs commodity scale out x86
– Single version of your data to server mixed analytics workloads without copying data
– Less infrastructure means reduced costs in many areas:
o Energy, cooling, server administration, floor space, SW licensing
• Faster, more flexible and scalable vs EMC Isilon using IBM Spectrum Scale
• Also Spectrum Scale software RAID requires only 30% extra storage vs 2X for Isilon
• Position for future growth, avoid hitting the data center wall with cluster sprawl
– Separating storage from compute enables the selection of the best compute node for the workload –
and Power has the greatest range of options
Client workstations
Users and applications
Compute farm
Traditionalapplications
Shared Filesystem
Analytics
Transparent HDFS
OpenStack
Cinder
Glance
Manilla
Object
Swift S3
Transparent Cloud
Powered byIBM Spectrum Scale
Automated data placement and data migration
Disk Tape Shared Nothing Cluster
Flash
New Genapplications
Transparent Cloud Tier
Worldwide Data Distribution
Site B
Site A
Site C
Spectrum Scale: Unleash new storage economics
SMBNFS
POSIX
File
4000+ customers using Spectrum Scale as data plane for HPC and analytics workload
Designed for Cognitive/AI with HDP on Power Systems
14
• PowerAI is the only commercial offering containing all key deep learning frameworks
– Caffe, TensorFlow, Torch, Theano, OpenBLAS, NCCL, NVIDIA DIGITS*
• Add AI to your Hadoop and Spark environment without investing in new server architecture
– Power is ideally suited for Cognitive/AI with bandwidth differentiation & extreme accelerator innovation
o POWER8 processors are ready to meet the demands with 4X threads and memory bandwidth vs x86
– Easy to deploy package to add optimized open source AI technologies to your Hadoop and Spark
environment with PowerAI and enterprise support
• Deep learning training faster on optimized Power Systems with GPU acceleration
– Enabled by servers offering 2.8X throughput vs x86 from CPU to GPU with NVLink technology
– Build learning models from images, speech, or other media in less time than prior generations of
hardware and software
– Innovation at a faster pace, as developers can invent and try out many new models, parameter
settings, and data sets.
Modern Data Platform for the Cognitive Era
Augment
Stored in Elastic Storage Server:
The software defined data layer
Big Data + Storage
Administrators
Data Developers
and Scientists
Data Lakes and Streams
Running on POWER8:
The Platform Designed for Big Data
Personalize Customer
Experience
Improve Call Center
Response Time
Maintain AML
Compliance
Automated Financial
Advisors
Detect and Prevent
Fraud
Connect
PowerAI
Learn
Machine learning is at your fingertips
AI/ML Tools
IBM PowerAI
Deep Learning Platform
Analytics Platform
Hortonworks Data Platform
Compute
IBM Power Systems
Storage
IBM Elastic Storage Server
HDP on IBM Power Systems
and Spectrum Scale Storage
How to Get Started with HDP on OpenPOWER Systems
• Join the Hortonworks Community: https://community.hortonworks.com/
• Learn more about the benefits of Hortonworks: http://hortonworks.com/training/
• Learn more about the benefits of IBM Power Systems and OpenPOWER:
https://www.ibm.com/systems/power
• Join the upcoming HDP and Power Webinar May 4, 2017:
https://www.ibm.com/systems/power/solutions/data-platform/hortonworks.html
• If you are interested in discussing a HDP on Power Systems option or proposal,
talk to your Hortonworks or Power reps
17© 2016 IBM Corporation