9
1/31/2012 ©MapR Technologies - Confidential 1 Big Data Analytics The Network is the Bottleneck

Big Data Analytics The Network is the Bottleneck › us › Images › 11_Jack_Norris.pdf · 2014-12-18 · 1/31/2012 ©MapR Technologies - Confidential 8 MapR Performance Advantages

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Big Data Analytics The Network is the Bottleneck › us › Images › 11_Jack_Norris.pdf · 2014-12-18 · 1/31/2012 ©MapR Technologies - Confidential 8 MapR Performance Advantages

1/31/2012 ©MapR Technologies - Confidential 1

Big Data Analytics The Network is the Bottleneck

Page 2: Big Data Analytics The Network is the Bottleneck › us › Images › 11_Jack_Norris.pdf · 2014-12-18 · 1/31/2012 ©MapR Technologies - Confidential 8 MapR Performance Advantages

1/31/2012 ©MapR Technologies - Confidential 2

Data Volume Growing 44x

2020: 35.2

Zettabytes

2010:

1.2

Zettabytes

Data is Growing Faster than Moore’s Law

Business Analytics Requires a New Approach

Source: IDC Digital Universe Study, sponsored by EMC, May 2010

IDC Digital Universe

Study 2011

Page 3: Big Data Analytics The Network is the Bottleneck › us › Images › 11_Jack_Norris.pdf · 2014-12-18 · 1/31/2012 ©MapR Technologies - Confidential 8 MapR Performance Advantages

1/31/2012 ©MapR Technologies - Confidential 3

The Next Generation Distribution

• Complete Distribution for Apache Hadoop

• Integrated, tested, hardened

• Supported

• 100% Hadoop, HBase, HDFS API compatible

• Unique advanced features

• No changes required to Hadoop applications

• Runs on commodity hardware

Page 4: Big Data Analytics The Network is the Bottleneck › us › Images › 11_Jack_Norris.pdf · 2014-12-18 · 1/31/2012 ©MapR Technologies - Confidential 8 MapR Performance Advantages

1/31/2012 ©MapR Technologies - Confidential 4

Innovations of Next Generation Distribution

• High Availability Architecture • Snapshots • Mirroring

• NFS Access • Graphical Management

• Speed jobs by more than 2X • Save $$$ on hardware

Page 5: Big Data Analytics The Network is the Bottleneck › us › Images › 11_Jack_Norris.pdf · 2014-12-18 · 1/31/2012 ©MapR Technologies - Confidential 8 MapR Performance Advantages

1/31/2012 ©MapR Technologies - Confidential 5

Importance of File-based Access

File Browsers

Access Directly “Drag & Drop”

Random Read Random Write

Log directly

grep

sed

sort

tar

Standard Linux Commands & Tools

Applications

Hadoop Cluster

Page 6: Big Data Analytics The Network is the Bottleneck › us › Images › 11_Jack_Norris.pdf · 2014-12-18 · 1/31/2012 ©MapR Technologies - Confidential 8 MapR Performance Advantages

1/31/2012 ©MapR Technologies - Confidential 6

High Availability and Data Protection

MapR Distribution

Hive Pig Oozie Sqoop Plume HBase

Mahout Cascading Nagios

Integration

Ganglia

Integration Flume More

MapReduce

MapR’s Lockless Storage Services ™

Distributed NameNode HA™

JobTracker HA ™

• High availability

• Stateful failover

• Unlimited number of files

A B D D’

Data Blocks

Active Files Snapshots

C

• Recover from app or user errors

• Zero performance loss on write

• Easy recovery with drag and drop

Page 7: Big Data Analytics The Network is the Bottleneck › us › Images › 11_Jack_Norris.pdf · 2014-12-18 · 1/31/2012 ©MapR Technologies - Confidential 8 MapR Performance Advantages

1/31/2012 ©MapR Technologies - Confidential 7

File Create Benchmark

Out of box

Testing completed on 10 node cluster, 2x Quad-Core, 24G DRAM 12 x 1TB SATA Drives @ 7200 rpm

MapR Distribution

Standard Distributions

Out of box

Tuned

Total Files (M)

Page 8: Big Data Analytics The Network is the Bottleneck › us › Images › 11_Jack_Norris.pdf · 2014-12-18 · 1/31/2012 ©MapR Technologies - Confidential 8 MapR Performance Advantages

1/31/2012 ©MapR Technologies - Confidential 8

MapR Performance Advantages

YCSB on HBase (higher is better)

Terasort (lower is better)

10 node cluster, 2x Quad-Core, 24G DRAM

12 x 1TB SATA Drives @ 7200 rpm, Quad NICs

Elap

sed

tim

e in

min

ute

s

Rec

ord

Inse

rts

per

sec

(0

00

s)

0

50

100

150

200

250

MapR

Other

3.5 TB 0

100

200

300

400

500

600

WAL Off WAL On

Page 9: Big Data Analytics The Network is the Bottleneck › us › Images › 11_Jack_Norris.pdf · 2014-12-18 · 1/31/2012 ©MapR Technologies - Confidential 8 MapR Performance Advantages

1/31/2012 ©MapR Technologies - Confidential 9 9