17
Best Practices for Hadoop Performance with Hortonworks Data Platform on OpenPOWER Systems

Best Practices for Hadoop Performance with Hortonworks ... · PDF fileLinux by Redhat: Redhat 7.2 Linux OS Mellanox: InfiniBand/Ethernet Connectivity in and out of server ... Spark,

  • Upload
    docong

  • View
    223

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Best Practices for Hadoop Performance with Hortonworks ... · PDF fileLinux by Redhat: Redhat 7.2 Linux OS Mellanox: InfiniBand/Ethernet Connectivity in and out of server ... Spark,

Best Practices for Hadoop Performance

with

Hortonworks Data Platform

on

OpenPOWER Systems

Page 2: Best Practices for Hadoop Performance with Hortonworks ... · PDF fileLinux by Redhat: Redhat 7.2 Linux OS Mellanox: InfiniBand/Ethernet Connectivity in and out of server ... Spark,

Agenda

• Hortonworks and IBM Partnership

• Open Community Innovation

• Leading Time to Insights

• The Fast Lane to Machine Learning

2

Page 3: Best Practices for Hadoop Performance with Hortonworks ... · PDF fileLinux by Redhat: Redhat 7.2 Linux OS Mellanox: InfiniBand/Ethernet Connectivity in and out of server ... Spark,

Predictloan defaults

Personalizecustomer journeys

Preventfraud

Preventnon-compliance

The AI Opportunity

Data Lake(relational and

new data stores)

Cognitive

Human Intelligence Exhibited by

Machines

Advanced Analytics

(Augmented with AI Methods)

Machine Learning

“Trained” using large amounts of data & ability to

learn how to perform the task

Deep Learning

Break tasks into Deep Neural

Networks

Page 4: Best Practices for Hadoop Performance with Hortonworks ... · PDF fileLinux by Redhat: Redhat 7.2 Linux OS Mellanox: InfiniBand/Ethernet Connectivity in and out of server ... Spark,

Hortonworks and IBM Collaboration

4

DATAWORKS SUMMIT/HADOOP SUMMIT MUNICH, Germany, April 4, 2017 — Hortonworks, Inc. (NASDAQ: HDP), a

leading innovator of open and connected data platforms, today announced the general availability of Hortonworks Data

Platform (HDP®) version 2.6.

HDP 2.6 is also available on IBM Power Systems. The work Hortonworks and IBM are doing together, including their support

for ODPi, gives customers increased choice when selecting a top-tier distribution for Hadoop and Spark and enables them

to fully exploit the performance, scalability and acceleration capabilities of the POWER8 platform

Scott Gnau, CTO, Hortonworks at Edge.

Youtube: http://bit.ly/2dSOliW

James Wade, Director of Application Hosting, Florida Blue.

Youtube: http://bit.ly/2dxVHIY

© 2016 IBM Corporation

Page 5: Best Practices for Hadoop Performance with Hortonworks ... · PDF fileLinux by Redhat: Redhat 7.2 Linux OS Mellanox: InfiniBand/Ethernet Connectivity in and out of server ... Spark,

Customer Story – Guidewell Health - Florida Blue

• Business Problem

– Transformational journey resulting in rapid expansion of business models

– Technology innovation required to keep up with the business expansion while improving

client satisfaction, reducing costs and supporting the company’s green IT initiatives

o Existing x86 server sprawl not sustainable

• Solution with Hortonworks, IBM OpenPOWER servers and Sage Solutions Consulting

– Embraces the open software and hardware model adopted by Florida Blue

– Hortonworks supporting new fraud analytics initiative to reduce costs and client premiums

– OpenPOWER to enable smaller datacenter footprint with stronger reliability

5

See the full story in this Hortonworks Blog post.

© 2016 IBM Corporation

Page 6: Best Practices for Hadoop Performance with Hortonworks ... · PDF fileLinux by Redhat: Redhat 7.2 Linux OS Mellanox: InfiniBand/Ethernet Connectivity in and out of server ... Spark,

- Design and cost optimized for deployments of multiples (cloud and cluster)

- Broad number of optimal solutions

- Co-Designed with the OpenPOWER Ecosystem

Supported by Canonical

IBM Support

Community / 3rd

Party Support

running

The LC Line

The L Line

PurePower

Enterprise& IFLs

- Enterprise level RAS for single system deployments

- Solutions for Big Data & Analytics

- Converged infrastructure offering

- Rapid time to value and simplicity of management

- Enterprise level robustness and IFL capability

- Solution editions for in memory databases

- (HANA, DB2 BLU)

- Hosted cloud and hybrid cloud solutions

- Rapid deployments and POCs

The IBM Power Systems Linux Portfolio

Pipeline of innovation

Broad Linux portfolio

delivers all your Linux

deployment needs

POWER8 is designed for the Big Data era and

delivers price-performance leadership to the

Linux Market!

65© 2016 IBM Corporation

Page 7: Best Practices for Hadoop Performance with Hortonworks ... · PDF fileLinux by Redhat: Redhat 7.2 Linux OS Mellanox: InfiniBand/Ethernet Connectivity in and out of server ... Spark,

The OpenPOWER Foundation

7

The OpenPOWER Foundation

is an open ecosystem,

using the

POWER Architecture to serve

the evolving needs of

customers.

• Moore’s law no longer

satisfies performance gain

• Growing workload demands

• Numerous IT consumption

models

• Mature Open software

ecosystem

Open Development

open software, open hardware

Collaboration of thought leaders

simultaneous innovation, multiple disciplines

• Rich software ecosystem

• Spectrum of power servers

• Multiple hardware options

• Derivative POWER chips

Market Shifts New Open Innovation

Performance of POWER architecture

amplified capability

AccelerationTechnology FAB

I/O Networking

Storage

FW Open

Source SYS ODM

OEM

SW Linux

ISV

Open Source

Chip

SoC Dev

IP Dev Technology

FAB

I/O Networking

Storage

FW Open

Source SYS ODM

OEM

SW Linux

ISV

Open Source

Chip

SoC Dev

IP Dev

WEB 2.0 Data Center

MSP

Cloud

Members And growing ….

120+ 300+

© 2016 IBM Corporation

Page 8: Best Practices for Hadoop Performance with Hortonworks ... · PDF fileLinux by Redhat: Redhat 7.2 Linux OS Mellanox: InfiniBand/Ethernet Connectivity in and out of server ... Spark,

Innovation Pervasive in the Design

Power Systems S822LC for Big Data

Not Just Another Intel Server

NVIDIA:

Tesla K80 GPU Accelerator

Linux by Redhat:

Redhat 7.2 Linux OS

Mellanox: InfiniBand/Ethernet

Connectivity in and out of server

HGST: Optional NVMe Adapters

Alpha Data with Xilinx FPGA:

Optional CAPI Accelerator

Broadcom: Optional PCIe Adapters

QLogic: Optional Fiber Channel PCIe

Samsung: SSDs & NVMe

Hynix, Samsung, Micron: DDR4

IBM: POWER8 CPU

8

Page 9: Best Practices for Hadoop Performance with Hortonworks ... · PDF fileLinux by Redhat: Redhat 7.2 Linux OS Mellanox: InfiniBand/Ethernet Connectivity in and out of server ... Spark,

4XThreads per core

4X Mem. Bandwidth1

6XMore cache2 @

Lower Latency

SMT=Simultaneous Multi-Threading

OLTP = On-Line Transaction Processing

These design decisions result in best performance for data centric workloads like:

Spark, Hadoop, Database, NoSQL, Big Data Analytics, OLTP

POWER8: Designed for data to deliver breakthrough performance

POWER8

SMT8

x86

Hyperthread

Parallel Processing

POWER8pipe

Data flow

x86 pipe POWER8

x86 POWER8 +

OpenPOWER

x86

9

1. Up to 4X depending on specific x86 and POWER8 servers being compared

2. Up to 6X more cache comparing Intel e7-8890 servers to 12 core POWER8 servers. See speaker notes for more details

© 2016 IBM Corporation

Page 10: Best Practices for Hadoop Performance with Hortonworks ... · PDF fileLinux by Redhat: Redhat 7.2 Linux OS Mellanox: InfiniBand/Ethernet Connectivity in and out of server ... Spark,

Flexibility with HDP on Power Systems

• Scale Up or Out to Meet Evolving Workloads

– Scale up each node by exploiting the memory bandwidth and multi-threading

– 4X threads per core vs x86 allows you to optimize and drive more workload per node

– Offering 4X memory bandwidth vs x86, POWER8 gives you more options as your workloads expand

and evolve

• Unmatched Range of Linux Servers

– From 1U, 16-core servers up to 16 socket, 192 core powerhouses with industry leading reliability all

running standard Linux

– Virtualization options to host low cost dev environments or rich, multi-tenant private clouds

– Wide range of OpenPOWER servers offered by OpenPower members for on-prem and the cloud

• Accelerated Analytics

– Add accelerators (flash, GPU, FPGA) with direct access to processor memory with OpenCAPI

10

Page 11: Best Practices for Hadoop Performance with Hortonworks ... · PDF fileLinux by Redhat: Redhat 7.2 Linux OS Mellanox: InfiniBand/Ethernet Connectivity in and out of server ... Spark,

IBM Power S822LC for Big Data and Hortonworks

Combine to Deliver Leadership in Hadoop Environments

• Performance results are based on preliminary IBM Internal Testing of 10 queries (simple, medium, and complex) with varying runtimes running against a 10TB database. The tests were run on 10 x IBM Power System S822LC for Big Data

20 cores / 40 threads, 2 X POWER8 2.92GHz, 256 GB memory, RHEL 7.2,, HDP 2.5.3 compared to the published x86/Hortonworks results running on 10 x AWS d2.8xlarge EC2 nodes running HDP 2.5; details can be found at

https://hortonworks.com/blog/apache-hive-going-memory-computing/ . Conducted under laboratory condition, individual result can vary based on workload size, use of storage subsystems

& other conditions. Data as of February 28, 2017)

• POWER8 and Hortonworks deliver 1.70X the throughput

compared to Hortonworks running on x86

– 70% More QpH based on the average response time –

complete the same amount of work with less system

resources

– 41% Reduction on average in query response time –

reduced response time enables making business

decisions faster.

70%More

Throughput

Page 12: Best Practices for Hadoop Performance with Hortonworks ... · PDF fileLinux by Redhat: Redhat 7.2 Linux OS Mellanox: InfiniBand/Ethernet Connectivity in and out of server ... Spark,

TCO at Scale with HDP on Power Systems

12

• Up to 3X reduction of storage and compute infrastructure moving to Power Systems and Elastic

Storage Server vs commodity scale out x86

– Single version of your data to server mixed analytics workloads without copying data

– Less infrastructure means reduced costs in many areas:

o Energy, cooling, server administration, floor space, SW licensing

• Faster, more flexible and scalable vs EMC Isilon using IBM Spectrum Scale

• Also Spectrum Scale software RAID requires only 30% extra storage vs 2X for Isilon

• Position for future growth, avoid hitting the data center wall with cluster sprawl

– Separating storage from compute enables the selection of the best compute node for the workload –

and Power has the greatest range of options

Page 13: Best Practices for Hadoop Performance with Hortonworks ... · PDF fileLinux by Redhat: Redhat 7.2 Linux OS Mellanox: InfiniBand/Ethernet Connectivity in and out of server ... Spark,

Client workstations

Users and applications

Compute farm

Traditionalapplications

Shared Filesystem

Analytics

Transparent HDFS

OpenStack

Cinder

Glance

Manilla

Object

Swift S3

Transparent Cloud

Powered byIBM Spectrum Scale

Automated data placement and data migration

Disk Tape Shared Nothing Cluster

Flash

New Genapplications

Transparent Cloud Tier

Worldwide Data Distribution

Site B

Site A

Site C

Spectrum Scale: Unleash new storage economics

SMBNFS

POSIX

File

4000+ customers using Spectrum Scale as data plane for HPC and analytics workload

Page 14: Best Practices for Hadoop Performance with Hortonworks ... · PDF fileLinux by Redhat: Redhat 7.2 Linux OS Mellanox: InfiniBand/Ethernet Connectivity in and out of server ... Spark,

Designed for Cognitive/AI with HDP on Power Systems

14

• PowerAI is the only commercial offering containing all key deep learning frameworks

– Caffe, TensorFlow, Torch, Theano, OpenBLAS, NCCL, NVIDIA DIGITS*

• Add AI to your Hadoop and Spark environment without investing in new server architecture

– Power is ideally suited for Cognitive/AI with bandwidth differentiation & extreme accelerator innovation

o POWER8 processors are ready to meet the demands with 4X threads and memory bandwidth vs x86

– Easy to deploy package to add optimized open source AI technologies to your Hadoop and Spark

environment with PowerAI and enterprise support

• Deep learning training faster on optimized Power Systems with GPU acceleration

– Enabled by servers offering 2.8X throughput vs x86 from CPU to GPU with NVLink technology

– Build learning models from images, speech, or other media in less time than prior generations of

hardware and software

– Innovation at a faster pace, as developers can invent and try out many new models, parameter

settings, and data sets.

Page 15: Best Practices for Hadoop Performance with Hortonworks ... · PDF fileLinux by Redhat: Redhat 7.2 Linux OS Mellanox: InfiniBand/Ethernet Connectivity in and out of server ... Spark,

Modern Data Platform for the Cognitive Era

Augment

Stored in Elastic Storage Server:

The software defined data layer

Big Data + Storage

Administrators

Data Developers

and Scientists

Data Lakes and Streams

Running on POWER8:

The Platform Designed for Big Data

Personalize Customer

Experience

Improve Call Center

Response Time

Maintain AML

Compliance

Automated Financial

Advisors

Detect and Prevent

Fraud

Connect

PowerAI

Learn

Page 16: Best Practices for Hadoop Performance with Hortonworks ... · PDF fileLinux by Redhat: Redhat 7.2 Linux OS Mellanox: InfiniBand/Ethernet Connectivity in and out of server ... Spark,

Machine learning is at your fingertips

AI/ML Tools

IBM PowerAI

Deep Learning Platform

Analytics Platform

Hortonworks Data Platform

Compute

IBM Power Systems

Storage

IBM Elastic Storage Server

HDP on IBM Power Systems

and Spectrum Scale Storage

Page 17: Best Practices for Hadoop Performance with Hortonworks ... · PDF fileLinux by Redhat: Redhat 7.2 Linux OS Mellanox: InfiniBand/Ethernet Connectivity in and out of server ... Spark,

How to Get Started with HDP on OpenPOWER Systems

• Join the Hortonworks Community: https://community.hortonworks.com/

• Learn more about the benefits of Hortonworks: http://hortonworks.com/training/

• Learn more about the benefits of IBM Power Systems and OpenPOWER:

https://www.ibm.com/systems/power

• Join the upcoming HDP and Power Webinar May 4, 2017:

https://www.ibm.com/systems/power/solutions/data-platform/hortonworks.html

• If you are interested in discussing a HDP on Power Systems option or proposal,

talk to your Hortonworks or Power reps

17© 2016 IBM Corporation