Analytics in the cloud - USENIX · Analytics in the cloud Dow we really need to reinvent the storage stack? Image courtesy NASA / ESA R. Ananthanarayanan, Karan Gupta, PrashantPandey,

© 2002 IBM Corporation

IBM Research

Storage Systems Research © 2008 IBM Corporation

Analytics in the cloud

Dow we really need to reinvent the storage stack?

Image courtesy NASA / ESA

R. Ananthanarayanan, Karan Gupta, Prashant Pandey,

Himabindu Pucha, Prasenjit Sarkar, Mansi Shah, Renu

Tewari

IBM Research


Data-Intensive Internet Scale Applications

Typical Applications

• Web-scale search, indexing, mining

• Genomic sequencing

• brain-scale network simulations

IBM Research


Data-Intensive Internet Scale Applications

� Key Requirements

• Scale to very large data sets

• Platform needs to scale to 1000’s of nodes

• Built of commodity hardware for cost efficiency

• Tolerate failures during “every” job execution

• Support data shipping to reduce network requirements

IBM Research


MapReduce for analytics

� MapReduce is emerging as a model for large-scale analytics application

� Important design goals are extreme-scalability and fault-tolerance

� Storage layer is separated and has well-defined requirements

QuickTime™ and a decompressor

are needed to see this picture.

Image source: http://developer.yahoo.com/hadoop/tutorial/module1.html

IBM Research


MapReduce Data-store requirements

� Provide a hierarchical namespace with directories and files

� Allow applications to read/write data to files

� Protect data availability and reliability in the face of node and

disk failures

� Provide high bandwidth access to reasonably-sized chunks of

data to all compute nodes (not necessarily all-to-all)

� Provide chunk access-affinity information to allow proper

scheduling of tasks

IBM Research


Data store options: Cluster FS Vs Specialized FS

YesYesCommodity hardware compliant

Yes

Specialized FS

YesScaling

Cluster FS

IBM Research


YesNo Mature management tools


YesNoTraditional application support

Yes

Specialized FS

YesScaling

Cluster FS


IBM Research


YesNo Mature management tools


YesNoTraditional application support

Yes

Yes

Specialized FS

NoTuned for Hadoop

YesScaling

Cluster FS


IBM Research


Modifying a Cluster Filesystem for MapReduce

� GPFS

• Mature filesystem - many large production installations

• High performance, Highly scalable

• Reliability features focused on SAN environments

� Supports rack-aware 2-way replication

• POSIX interface

• Supports shared disk (SAN) and shared-nothing setups

• Not optimized for MapReduce workloads

� Does not expose data location information

� largest block size = 16 MB

� Changes for Hadoop:

• Make blocks bigger

• Let the platform know where the big blocks are

• Optimize replication and placement to reduce network usage

IBM Research


Key change: Metablocks

� Works for many workloads

• Small FS blocks (eg: 512K)

• Large Application blks (eg: 64M)

� New allocation scheme

• Metablock size granularity for

wide striping

� Block map operates on large

Metablock size

� All FS operations operate on small

regular block size

� Additional changes to provide

block location information and

“write affinity”

Application “meta-block”FS block

New allocation policy

IBM Research


MapReduce performance

Test bed

iDataPlex: 42 nodes

8 cores, 8GB RAM

4+1 disks per-node

16 nodes

160 GB data

(replication factor = 2)

Hadoop : version 0.18.1

GPFS: version pre3.3

0.00

500.00

1000.00

1500.00

2000.00

2500.00

Grep TeraGen Terasort

Time (sec)

GPFS rack=1

HDFS rack=1

GPFS rack=2

HDFS rack=2

IBM Research


Impact on traditional workloads

0

0.5

1

1.5

2

2.5

512K 512K, metablock 16M

Normalized perf

Normalized Random perf Normalized Seq read perf

iDataPlex: 42 nodes

8 cores, 8GB RAM

4+1 disks per-node

GPFS: version pre3.3

Bonnie filesystem benchmark

IBM Research


Things that didn’t work

� Large filesystem block-size

� Turn-off Prefetching

� Create alignment of records

to block boundaries 0

100

200

300

400

500

600

700

800

HDFS

GPFS

16M

GPFS

16ML

GPFS

16MLA

GPFS

16MLA

NP

File System

0

0.5

1

1.5

2

2.5

512K 16M 16M, no-prefetch

Time (sec)

Normalized Random perf Normalized Seq read perf

IBM Research


Advantages of traditional filesystems

� Traditional filesystems have solved many hard problems like

access control, quotas, snapshots …

� Allow traditional and MapReduce applications to share the

same input data.

� Exploit Filesystem tools & scripts based on “regular”

filesystems.

� Re-use of Backup/Archive solutions built around particular

filesystems.

� Mixed analytics pipelines.

Load AnalyzeCrawl Output Serve

Crawl Analyze Serve

Using a MapReduce-specific filesystem (e.g. HDFS):

Using a general-purpose filesystem (e.g. GPFS):

Crawler writes to a traditional filesystem Into mapreduce filesystem Back to traditional fielsystem

IBM Research


Conclusion

� MapReduce platforms can use traditional filesystems without

loss of performance.

� There are important reasons why traditional filesystems are

attractive to users of MapReduce.

Documents

Analytics in the cloud - USENIX · Analytics in the cloud Dow we really need to reinvent the storage stack? Image courtesy NASA / ESA R. Ananthanarayanan, Karan Gupta, PrashantPandey,