Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
March 2013
[email protected] Exar Corporation 48720 Kato Road Fremont, CA 510-668-7000 www.exar.com
Exar Optimizing Hadoop – Is Bigger Better??
• Section I: Exar Introduction – Exar Corporate Overview
• Section II: Big Data Pain-Points – Debunking Top 5 Hadoop Myths – 3 Main System Constraints
• Section III: Hadoop Optimization Solution – Exar Hadoop Acceleration Solutions
• Section IV: Benchmarking Results – OEM 1 Results – OEM 2 Results – OEM Results
• Section V: Summary
Exar At-A-Glance Global Leader in Data Management Solutions and Mixed Signal Components
• Well Established Fabless IC Company – 42 years of history in Silicon Valley
– ~ 300 Employees Worldwide
– Healthy balance sheet - $229M in assets
• Broad-base Component and Solution Supplier – Specialty SoCs, FPGA/ASIC Boards and Software
• DCS (Data Compression & Security)
– Analog Mixed Signal Components • Interface
• Power
• Section I: Exar Introduction – Exar Corporate Overview
• Section II: Big Data Pain-Points – Debunking Top 5 Hadoop Myths – 3 Main System Constraints
• Section III: Hadoop Optimization Solution – Exar Hadoop Acceleration Solutions
• Section IV: Benchmarking Results – OEM 1 Results – OEM 2 Results – OEM Results
• Section V: Summary
Is Bigger Always Better??
It is not about Size of Big-Data Deployment Return on Investment would be defined by Optimal Utilization of Resources
Debunking the Top 5 Hadoop Myths 1. More CPUs or More Storage does not mean better Analytics
Increasing Number of Jobs Per Node, or, Improving Job processing time, implies more powerful Nodes…..
No!!!
Rack Density maximization and effective resource utilization (CPU, Storage and Memory) is the solution
Capital expenditure is the primary contributor to the 3 or 5 Year TCO
Debunking Top 5 Hadoop Myths 2. Operational Expenditure is a significant component of 3-5 Years TCO
No!!! Operational expenditure is a significant contributor in the TCO
Debunking the Top 5 Hadoop Myths 3. Storage scaling is significantly constrained by Size and Space
Storage can Scale Easily
No!!!
Size, Space and Connectivity constrains scaling capacity
Debunking the Top 5 Hadoop Myths 4. Data Nodes costs are driven by Storage rather than CPUs
Compute defines the Data node cost
No!!!!
Storage defines the node cost, and the ratio is often as high as 10:1 (Storage to CPU)
Debunking the Top 5 Hadoop Myths 5. For larger Hadoop Clusters Network (Shuffle) traffic reduction is a key
Network Traffic Reduction is not relevant in Hadoop TCO
No!!!
10G WAN Links are expensive. It is preferable to optimize traffic on 1G WAN Links, and avoid/minimize 10G Links
Summary of Hadoop Cluster Constraints Hadoop Clusters can be Optimized for Storage, Network Bandwidth & Compute Resources
Server OEMs are Struggling to provide enough Capacity
to keep up with every growing Data Needs E.g. – Leading Server OEM Latest Configuration supports
30 Disks/Server!!!
Storage Capacity
The biggest bottleneck for Data Analytics is the Disk IOPs limitation
E.g. – Even the most optimally configured Hadoop System is struggling to get better than 80% CPU Utilization, as Disk IO bandwidth is not able to keep up, especially for high CPU Core to HDD Ratios
Disk IOPs Bottleneck
Data is often Replicated 3 times, and Large Clusters are distributed globally. Minimizing bandwidth (across WAN) and minimizing Switch/HW Cost (across LAN) is key
E.g. – A Leading eCommerce Company has 6 Clusters distributed globally, with each Cluster having 2,000-3,000 Data Nodes
Network Bandwidth
Can Hadoop Cluster TCO be reduced without impacting job execution time??
Exar Hadoop Acceleration Solutions can lower Cluster TCO by 20-40%!!
Exar Hadoop Optimization Solutions By optimizing CPU, Storage, Memory , & Network Bandwidth, TCO can be reduced up to 40%
• Section I: Exar Introduction – Exar Corporate Overview
• Section II: Big Data Pain-Points – Debunking Top 5 Hadoop Myths – 3 Main System Constraints
• Section III: Hadoop Optimization Solution – Exar Hadoop Acceleration Solutions
• Section IV: Benchmarking Results – OEM 1 Results – OEM 2 Results – OEM Results
• Section V: Summary
Exar Hadoop Acceleration Solution Overview Exar Solution optimizes all the Hadoop Cluster Constraints mentioned earlier
Exar Hadoop Acceleration Solution Highlights:
Storage Optimization – Exar Solution uses Advanced Data Compression technique to Compress Input and Output Data, which drastically reduces Storage requirement in each Data Node
CPU Optimization – Data Compression/Decompression is Offloaded from CPU, which releases additional CPU Cycles for Enhanced Data Analytics
Memory Management – Exar Solution uses advanced Memory Management, which optimizes the System Memory Usage
Network Bandwidth Optimization – Exar Solution Compresses Intermittent or Shuffle traffic, which optimizes Network Bandwidth
Exar Hadoop Acceleration Solution Overview Exar offers a Certified Plug N Play Hadoop Acceleration solution
Plug N Play Solution:
No Code Change – Filter Layer SW sits below the HDFS. No APIs required. SW installs in minutes!
Standard HW – Offload card supports PCIe Gen 1 and Gen 2
Linux OS Compatible – Solution supports Linux 6.X, and works across RHEL, Ubuntu and SUSE
Certified by Cloudera:
Solution Certified on both CDH3 and CDH4
OEM Tested:
Solutions evaluated and benchmarked on leading OEM HW including IBM, HP, Dell, SuperMicro etc
Big Data (Hadoop) Optimization Solution Exar Solutions Reduce Storage Requirement & Optimize System Resource Utilization
A Hadoop Cluster Accelerated with AltraSTAR consists of:
CeDeFS Filter Layer SW
Exar Hardware Accelerator
CeDeFS is a transparent Filter Layer SW and sits below HDFS. No code changes are required and workflow remains the same
Exar Accelerator is a FPGA based PCIe HW Accelerator 3x-6x increase in storage capacity in each node
Enhanced CPU utilization and reduced runtime through I/O reduction and optimization Significantly benefits I/O bound tasks Increased data density; reduces the shuffle traffic Reduction in Power – Per Node, Per Cluster
Storage Volume
CeDeFS + CeDeFN
Exar Driver
Exar Offload
Card
Hadoop Map/Reduce
Linux System
Hadoop FS
• Section I: Exar Introduction – Exar Corporate Overview
• Section II: Big Data Pain-Points – Debunking Top 5 Hadoop Myths – 3 Main System Constraints
• Section III: Hadoop Optimization Solution – Exar Hadoop Acceleration Solutions
• Section IV: Benchmarking Results – OEM 1 Results – OEM 2 Results – OEM Results
• Section V: Summary
Test Procedure Validate Exar Acceleration Solutions on Typical Hadoop Clusters
Configure System to Default Hadoop Setting
Establish Benchmark for Native Config (with LZO)
Rerun Tests with Exar Acceleration Solution
Disk Reduction
Network Link Opt
Large File Optimization
Quantify Results; Calculate ROI
Exar Hadoop Acceleration – OEM 1 Results Exar’s GX1745 based Acceleration Test Results
Cluster Configuration
Job Execution & Resource Req
300 TB EXAR Hadoop Accelerated Solution End-Users could reduce their Capital Expenditure up to 40%!!!
Exar Hadoop Acceleration – OEM 2 Results OEM Sorted 1 TB in an industry leading time; Exar reduced the cost by 30%
Servers = 10
Expansion Units = 5
Servers = 10
Expansion Units = 10 Exar Solution
Job Execution & Resource Req
Cluster Configuration
Terasort Test on AppSystem Cluster
12 Disks 6 Disks
Single Job (512GB)
Single Job (1TB)
Multiple Job Single Job (512GB) Job 2
Native LZO 14m 15s 33m 36s 33m 32s 21m 34s
AltraSTAR + LZO 8m 9s 16m 0s 19m 3s 12m 07s Performance Gain 70% 101% 76% 77%
Capacity Gain
Reduce cost and Improve performance through. 1. Improve performance 2. Remove disks or Lower Capacity disks 3. Increase Capacity
Exar Hadoop Acceleration – OEM 3 Results Solution gave the flexibility to increase Storage/CPU density per Rack
Exar Hadoop Acceleration – OEM 3 Results Exar Solution improved Analytics up to 70%, or, reduced Storage Cost up to 50%
Performance Maximized
Configuration Cost
Minimized Configuration
Exar Hadoop Accelerated Solutions Outperformed CPU solutions Implied or Calculated Results shed light on 4 of the 5 Hadoop Implementation Myths
Efficiency Parameter
Parameter Definition
Acceleration Benchmarks
AltraSTAR Accel Gain
System Resource Optimization
Ratio of CPU Cores to Hard Disks 1:2 1:1 100%
Cap-Ex Efficiency
$$ Cap Investment 1 GB Sort
N/A N/A 27%
Op-Ex Efficiency
KWh Consumed per 1 GB Sort
N/A N/A 20%
No
EX
AR
Acc
eler
atio
n
With
E
xar A
ccel
erat
ion
Storage Density
Effective Storage per 40U Rack
261 430 61%
• Section I: Exar Introduction – Exar Corporate Overview
• Section II: Big Data Pain-Points – Debunking Top 5 Hadoop Myths – 3 Main System Constraints
• Section III: Hadoop Optimization Solution – Exar Hadoop Acceleration Solutions
• Section IV: Benchmarking Results – OEM 1 Results – OEM 2 Results – OEM Results
• Section V: Summary
Exar Hadoop Acceleration Solution Exar Acceleration Solution optimizes all of the Hadoop Constraints
Significant ROI: Highest Rack Density
Lowest $$/GB Sort
Most Power Efficient
Optimized Network Bandwidth
Flexibility: Offers flexibility to cater to both Disk
IO Bound or CPU Bound Solutions
Certified: Certified on all Cloudera Releases, and
tested on most of the major OEM HW
Conclusion
• Hardware accelerated compression provides meaningful acceleration as well as added capacity
• Acceleration plus added capacity means bigger jobs executed in less time
• Very significant savings in both CAPEX and OPEX
Ramana Jampala
Vice-President – Business Development [email protected]
(732) 440-1280 x238
www.exar.com