28
© 2012 Mellanox Technologies 1 - Mellanox Confidential - Accelerating Big Data with RDMA solutions HPC advisory council, June 2013

Accelerating Big Data with RDMA solutions...•Scalability – Start at 1TB/3-nodes grow to petabytes/1000s of nodes. •Economics – Cost per TB at a fraction of traditional options

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Accelerating Big Data with RDMA solutions...•Scalability – Start at 1TB/3-nodes grow to petabytes/1000s of nodes. •Economics – Cost per TB at a fraction of traditional options

© 2012 Mellanox Technologies 1 - Mellanox Confidential -

Accelerating Big Data with RDMA solutions

HPC advisory council, June 2013

Page 2: Accelerating Big Data with RDMA solutions...•Scalability – Start at 1TB/3-nodes grow to petabytes/1000s of nodes. •Economics – Cost per TB at a fraction of traditional options

© 2012 Mellanox Technologies 2 - Mellanox Confidential -

Leading Supplier of End-to-End Interconnect Solutions

Host/Fabric Software ICs Switches/Gateways Adapter Cards Cables

Comprehensive End-to-End InfiniBand and Ethernet Portfolio

Virtual Protocol Interconnect

Storage Front / Back-End

Server / Compute Switch / Gateway

56G IB & FCoIB 56G InfiniBand

10/40/56GbE & FCoE 10/40/56GbE

Fibre Channel

Virtual Protocol Interconnect

Page 3: Accelerating Big Data with RDMA solutions...•Scalability – Start at 1TB/3-nodes grow to petabytes/1000s of nodes. •Economics – Cost per TB at a fraction of traditional options

© 2012 Mellanox Technologies 3 - Mellanox Confidential -

Three Areas for Accelerations

Data Analytics

• Explore inefficiencies in existing analytics frameworks and systems

• Accelerate data processing to deliver faster results

Storage

• Explore ways to refine dominant file system

• Take advantage for direct attached disk to accelerate data access

Distributed Storage

• Leverage popular distributed storage systems with Big Data applications

• Use existing systems for usage with Big Data frameworks

Page 4: Accelerating Big Data with RDMA solutions...•Scalability – Start at 1TB/3-nodes grow to petabytes/1000s of nodes. •Economics – Cost per TB at a fraction of traditional options

© 2012 Mellanox Technologies 4 - Mellanox Confidential -

Motivation to Accelerate Data Analytics

Data Analysis Requires Faster Network

• Hadoop Map Reduce Framework is a network

intensive workload

- Mapped data is shuffled between nodes in the cluster

• Data Replication

- A high availability event triggers Multi-Tera of data

movement

Provide Higher Data Value

• Expose SSD’s low latency capabilities

• Better server/CPU utilization

* Data Source: Intersect360 Research, 2012, IT and Data scientists survey

Big Data Applications Require High Bandwidth and Low Latency Interconnect

Page 5: Accelerating Big Data with RDMA solutions...•Scalability – Start at 1TB/3-nodes grow to petabytes/1000s of nodes. •Economics – Cost per TB at a fraction of traditional options

© 2012 Mellanox Technologies 5 - Mellanox Confidential -

A scalable fault-tolerant distributed system for data storage and processing

Hadoop has two main systems

• Hadoop Distributed File System: self-healing high-bandwidth clustered storage.

• MapReduce: distributed fault-tolerant resource management and scheduling coupled with a scalable data

programming abstraction.

Key values

• Flexibility – Store any data, Run any analysis.

• Scalability – Start at 1TB/3-nodes grow to petabytes/1000s of nodes.

• Economics – Cost per TB at a fraction of traditional options.

Hadoop Framework

HDFS™ (Hadoop Distributed File System)

Map Reduce HBase

DISK DISK DISK DISK DISK DISK

Hive Pig

Map Reduce

HDFS™ (Hadoop Distributed File System)

Page 6: Accelerating Big Data with RDMA solutions...•Scalability – Start at 1TB/3-nodes grow to petabytes/1000s of nodes. •Economics – Cost per TB at a fraction of traditional options

© 2012 Mellanox Technologies 6 - Mellanox Confidential -

Plug-in architecture • Open-source, latest GA version 3.1 (6/10/2013)

• Google code repository at: https://code.google.com/p/uda-plugin/

Accelerates Map Reduce Jobs • Accelerated merge sort

Efficient Shuffle Provider • Data transfer over RDMA

• Supports InfiniBand and Ethernet

Supported Hadoop Distributions • Apache 3.0 – In the main trunk!

• Apache 2.0.3 – In the main trunk*!

• Apache Hadoop 1.0.x ; 1.1.x

• Cloudera Distribution Hadoop 3 update 4 (CDH3u4)

• Cloudera Distribution Hadoop 4 (CDH4)

• Hortonworks HDP 1.1

Supported Hardware • ConnectX®-3 VPI

• SwitchX-2 based systems

Unstructured Data Accelerator - UDA

HDFS™ (Hadoop Distributed File System)

Map Reduce HBase

DISK DISK DISK DISK DISK DISK

Hive Pig

Map Reduce

Page 7: Accelerating Big Data with RDMA solutions...•Scalability – Start at 1TB/3-nodes grow to petabytes/1000s of nodes. •Economics – Cost per TB at a fraction of traditional options

© 2012 Mellanox Technologies 7 - Mellanox Confidential -

Map Reduce Serialization

Page 8: Accelerating Big Data with RDMA solutions...•Scalability – Start at 1TB/3-nodes grow to petabytes/1000s of nodes. •Economics – Cost per TB at a fraction of traditional options

© 2012 Mellanox Technologies 8 - Mellanox Confidential -

Shuffle

Merge

New

Algorithm

Time start

Map Map Map Map

Map Map Map Map

Map Map Map Map

Map

Stage

Reduce

Reduce

Header fetch

Header fetch

shuffle merge

shuffle merge

New Pipelined Data Flow

8

Page 9: Accelerating Big Data with RDMA solutions...•Scalability – Start at 1TB/3-nodes grow to petabytes/1000s of nodes. •Economics – Cost per TB at a fraction of traditional options

© 2012 Mellanox Technologies 9 - Mellanox Confidential -

UDA - Software Architecture

JobTracker

TaskTracker

ReduceTask

TaskTracker

MapTask

Hadoop (Java)

RDMA NIC / HCA

UDA Plugin (C++)

MOFSupplier

Data Engine RDMA

Server

NetMerger

RDMA Client Merging

Thread

Merging

Thread

Merging

Thread

Page 10: Accelerating Big Data with RDMA solutions...•Scalability – Start at 1TB/3-nodes grow to petabytes/1000s of nodes. •Economics – Cost per TB at a fraction of traditional options

© 2012 Mellanox Technologies 10 - Mellanox Confidential -

Double Map Reduce Performance with UDA

*TeraSort is a popular benchmark used to measure the performance of Hadoop cluster

~50% Disk Access CPU Efficiency 2.5X

**1TB Data Set, 16x dual X5670 (Westmere) Machines, 10x HDD Base; Vanilla GPHD1.2; UDA GPHD1.2+UDA

~2X Faster Job Completion! Increase the Value of Data!

FDR Infiniband

Page 11: Accelerating Big Data with RDMA solutions...•Scalability – Start at 1TB/3-nodes grow to petabytes/1000s of nodes. •Economics – Cost per TB at a fraction of traditional options

© 2012 Mellanox Technologies 11 - Mellanox Confidential -

HiBench is a combine test suite from Intel

• Tests: IO, Map Reduce, Machine Learning, Clustering and search applications

Faster Network provides between 15% and 100% performance Improvement!

• Some applications are more I/O bounded than others

HiBench Benchmark Results

Page 12: Accelerating Big Data with RDMA solutions...•Scalability – Start at 1TB/3-nodes grow to petabytes/1000s of nodes. •Economics – Cost per TB at a fraction of traditional options

© 2012 Mellanox Technologies 12 - Mellanox Confidential -

Linearly scalable, column index database

Enable 30% more queries

Cut latency gaps by 50%

Cassandra, Initial Results

Page 13: Accelerating Big Data with RDMA solutions...•Scalability – Start at 1TB/3-nodes grow to petabytes/1000s of nodes. •Economics – Cost per TB at a fraction of traditional options

© 2012 Mellanox Technologies 13 - Mellanox Confidential -

Three Areas for Accelerations

Data Analytics

• Explore inefficiencies in existing analytics frameworks and systems

• Accelerate data processing to deliver faster results

Storage

• Explore ways to refine dominant file system

• Take advantage for direct attached disk to accelerate data access

Distributed Storage

• Leverage popular distributed storage systems with Big Data applications

• Use existing systems for usage with Big Data frameworks

Page 14: Accelerating Big Data with RDMA solutions...•Scalability – Start at 1TB/3-nodes grow to petabytes/1000s of nodes. •Economics – Cost per TB at a fraction of traditional options

© 2012 Mellanox Technologies 14 - Mellanox Confidential -

The Great Things in Hadoop Distributed File System

• HDFS is a block storage solution

• Block size can be modified to provide efficient solutions for very large files

• Inherent reliability, no need for high end storage solution to make sure data is there!

• Tuned for Hadoop work loads, write one and read many

Page 15: Accelerating Big Data with RDMA solutions...•Scalability – Start at 1TB/3-nodes grow to petabytes/1000s of nodes. •Economics – Cost per TB at a fraction of traditional options

© 2012 Mellanox Technologies 15 - Mellanox Confidential -

The Less Great Things in HDFS

It’s hard to manage

the different setting

to get the right nodes

into the right capabilities.

Ingress and extraction

of data requires

additional tools.

Small files or latency sensitive Default 3x Replication Metadata Server Failure

Page 16: Accelerating Big Data with RDMA solutions...•Scalability – Start at 1TB/3-nodes grow to petabytes/1000s of nodes. •Economics – Cost per TB at a fraction of traditional options

© 2012 Mellanox Technologies 16 - Mellanox Confidential -

Considerations When Planning Capacity

Growth Rate Cost of Storage Data Retention

Do you need

Real-Time Analytics ?

Value Byte ?

If it’s not hot, is it

worth storing

on a high performance

storage?

Page 17: Accelerating Big Data with RDMA solutions...•Scalability – Start at 1TB/3-nodes grow to petabytes/1000s of nodes. •Economics – Cost per TB at a fraction of traditional options

© 2012 Mellanox Technologies 17 - Mellanox Confidential -

HDFS is the Hadoop File System

• The underlying File system for HBase and other NoSQL Data Bases

More Drives, Higher Throughput is Needed

SSDs Solutions Must use Higher Throughput

• Bounded by 1GbE and 10GbE

HDFS Acceleration; Joint Project With Ohio State University

HDFS™ (Hadoop Distributed File System)

Map Reduce HBase

DISK DISK DISK DISK DISK DISK

Hive Pig

Page 18: Accelerating Big Data with RDMA solutions...•Scalability – Start at 1TB/3-nodes grow to petabytes/1000s of nodes. •Economics – Cost per TB at a fraction of traditional options

© 2012 Mellanox Technologies 18 - Mellanox Confidential -

HDFS Acceleration; Joint Project With Ohio State University

Page 19: Accelerating Big Data with RDMA solutions...•Scalability – Start at 1TB/3-nodes grow to petabytes/1000s of nodes. •Economics – Cost per TB at a fraction of traditional options

© 2012 Mellanox Technologies 19 - Mellanox Confidential -

HDFS Acceleration; Joint Project With Ohio State University

Page 20: Accelerating Big Data with RDMA solutions...•Scalability – Start at 1TB/3-nodes grow to petabytes/1000s of nodes. •Economics – Cost per TB at a fraction of traditional options

© 2012 Mellanox Technologies 20 - Mellanox Confidential -

SSDs Become De-Facto standard in HDFS deployment

• Read capability is a critical factor for application performance

E-DFSIO, Part of Intel’s HiBench test suite, profiles aggregated throughput on the cluster

• 1GbE network impede any performance benefit from SSD deployment

Unlocking the Power SSDs In Hadoop Environment

E-DFSIO, Showing the Power of SSD @ HDFS

Page 21: Accelerating Big Data with RDMA solutions...•Scalability – Start at 1TB/3-nodes grow to petabytes/1000s of nodes. •Economics – Cost per TB at a fraction of traditional options

© 2012 Mellanox Technologies 21 - Mellanox Confidential -

Three Areas for Accelerations

Data Analytics

• Explore inefficiencies in existing analytics frameworks and systems

• Accelerate data processing to deliver faster results

Storage

• Explore ways to refine dominant file system

• Take advantage for direct attached disk to accelerate data access

Distributed Storage

• Leverage popular distributed storage systems with Big Data applications

• Use existing systems for usage with Big Data frameworks

Page 22: Accelerating Big Data with RDMA solutions...•Scalability – Start at 1TB/3-nodes grow to petabytes/1000s of nodes. •Economics – Cost per TB at a fraction of traditional options

© 2012 Mellanox Technologies 22 - Mellanox Confidential -

OrangeFS as Hadoop Storage Solution

Page 23: Accelerating Big Data with RDMA solutions...•Scalability – Start at 1TB/3-nodes grow to petabytes/1000s of nodes. •Economics – Cost per TB at a fraction of traditional options

© 2012 Mellanox Technologies 23 - Mellanox Confidential -

Lustre as Hadoop Storage Solution

Source: Map/Reduce on Lustre, Hadoop Performance in HPC Environments, Nathan Rutman, Senior Architect, Networked Storage Solutions, Xyratex

Page 24: Accelerating Big Data with RDMA solutions...•Scalability – Start at 1TB/3-nodes grow to petabytes/1000s of nodes. •Economics – Cost per TB at a fraction of traditional options

© 2012 Mellanox Technologies 24 - Mellanox Confidential -

CEPH as Hadoop Storage Solution

Generating lot of Interest since the Ceph kernel client was pulled into Linux kernel 2.6.34

• Object-based parallel file system

• Scalable metadata server

• Each file can specify it’s own striping strategy and object size

• Automatic rebalancing of data with minimal data movement

• Hadoop module for integrating Ceph has been in development since 0.12 release

Benchmarks on Ceph is still WIP

• We are currently working on using running benchmarks on Ceph – Stay tuned!!

Page 25: Accelerating Big Data with RDMA solutions...•Scalability – Start at 1TB/3-nodes grow to petabytes/1000s of nodes. •Economics – Cost per TB at a fraction of traditional options

© 2012 Mellanox Technologies 25 - Mellanox Confidential -

Mellanox VPI Card

• MCX354A-FCBT

Mellanox Edge Switches

• MSX10xx; MSX60xx

Cloudera Certified – CDH3 and CDH4

Page 26: Accelerating Big Data with RDMA solutions...•Scalability – Start at 1TB/3-nodes grow to petabytes/1000s of nodes. •Economics – Cost per TB at a fraction of traditional options

© 2012 Mellanox Technologies 26 - Mellanox Confidential -

E5-26x0 (Sandy Bridge) Machines • Dual Socket

• 4+ cores each socket

• 32GB+ of DRAM

Disk Drives • At least 5 x 1TB, SAS, 10K RPM

Hadoop Configuration • At least one Name Node + Job Tracker

• At least 4 Data Nodes

Installation: • Your selection of Hadoop Distribution or other Big Data solution (Such as Cassandra)

Networking • ConnectX-3 VPI card, FDR, 40GbE and 10GbE

• SwitchX based systems: MSX6036F, MSX1036B and MSX1016

• Mellanox’s FDR, 40GbE and 10GbE Cable Solutions

http://www.mellanox.com/related-docs/whitepapers/WP_Deploying_Hadoop.pdf

Simple Building Block for Big Data Solution

Page 27: Accelerating Big Data with RDMA solutions...•Scalability – Start at 1TB/3-nodes grow to petabytes/1000s of nodes. •Economics – Cost per TB at a fraction of traditional options

© 2012 Mellanox Technologies 27 - Mellanox Confidential -

EMC 1000-Node Analytic Platform

Accelerates Industry's Hadoop Development

24 PetaByte of physical storage

• Half of every written word since inception of mankind

Mellanox VPI Solutions

Test Drive Your Big Data

2X Faster Hadoop Job Run-Time Hadoop

Acceleration

High Throughput, Low Latency, RDMA Critical for ROI

Page 28: Accelerating Big Data with RDMA solutions...•Scalability – Start at 1TB/3-nodes grow to petabytes/1000s of nodes. •Economics – Cost per TB at a fraction of traditional options

© 2012 Mellanox Technologies 28 - Mellanox Confidential -

Thank You