Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
© 2002 IBM Corporation
IBM Research
Storage Systems Research © 2008 IBM Corporation
Analytics in the cloud
Dow we really need to reinvent the storage stack?
Image courtesy NASA / ESA
R. Ananthanarayanan, Karan Gupta, Prashant Pandey,
Himabindu Pucha, Prasenjit Sarkar, Mansi Shah, Renu
Tewari
IBM Research
Storage Systems Research © 2008 IBM Corporation
Data-Intensive Internet Scale Applications
Typical Applications
• Web-scale search, indexing, mining
• Genomic sequencing
• brain-scale network simulations
IBM Research
Storage Systems Research © 2008 IBM Corporation
Data-Intensive Internet Scale Applications
� Key Requirements
• Scale to very large data sets
• Platform needs to scale to 1000’s of nodes
• Built of commodity hardware for cost efficiency
• Tolerate failures during “every” job execution
• Support data shipping to reduce network requirements
IBM Research
Storage Systems Research © 2008 IBM Corporation
MapReduce for analytics
� MapReduce is emerging as a model for large-scale analytics application
� Important design goals are extreme-scalability and fault-tolerance
� Storage layer is separated and has well-defined requirements
QuickTime™ and a decompressor
are needed to see this picture.
Image source: http://developer.yahoo.com/hadoop/tutorial/module1.html
IBM Research
Storage Systems Research © 2008 IBM Corporation
MapReduce Data-store requirements
� Provide a hierarchical namespace with directories and files
� Allow applications to read/write data to files
� Protect data availability and reliability in the face of node and
disk failures
� Provide high bandwidth access to reasonably-sized chunks of
data to all compute nodes (not necessarily all-to-all)
� Provide chunk access-affinity information to allow proper
scheduling of tasks
IBM Research
Storage Systems Research © 2008 IBM Corporation
Data store options: Cluster FS Vs Specialized FS
YesYesCommodity hardware compliant
Yes
Specialized FS
YesScaling
Cluster FS
IBM Research
Storage Systems Research © 2008 IBM Corporation
YesNo Mature management tools
YesYesCommodity hardware compliant
YesNoTraditional application support
Yes
Specialized FS
YesScaling
Cluster FS
Data store options: Cluster FS Vs Specialized FS
IBM Research
Storage Systems Research © 2008 IBM Corporation
YesNo Mature management tools
YesYesCommodity hardware compliant
YesNoTraditional application support
Yes
Yes
Specialized FS
NoTuned for Hadoop
YesScaling
Cluster FS
Data store options: Cluster FS Vs Specialized FS
IBM Research
Storage Systems Research © 2008 IBM Corporation
Modifying a Cluster Filesystem for MapReduce
� GPFS
• Mature filesystem - many large production installations
• High performance, Highly scalable
• Reliability features focused on SAN environments
� Supports rack-aware 2-way replication
• POSIX interface
• Supports shared disk (SAN) and shared-nothing setups
• Not optimized for MapReduce workloads
� Does not expose data location information
� largest block size = 16 MB
� Changes for Hadoop:
• Make blocks bigger
• Let the platform know where the big blocks are
• Optimize replication and placement to reduce network usage
IBM Research
Storage Systems Research © 2008 IBM Corporation
Key change: Metablocks
� Works for many workloads
• Small FS blocks (eg: 512K)
• Large Application blks (eg: 64M)
� New allocation scheme
• Metablock size granularity for
wide striping
� Block map operates on large
Metablock size
� All FS operations operate on small
regular block size
� Additional changes to provide
block location information and
“write affinity”
Application “meta-block”FS block
New allocation policy
IBM Research
Storage Systems Research © 2008 IBM Corporation
MapReduce performance
Test bed
iDataPlex: 42 nodes
8 cores, 8GB RAM
4+1 disks per-node
16 nodes
160 GB data
(replication factor = 2)
Hadoop : version 0.18.1
GPFS: version pre3.3
0.00
500.00
1000.00
1500.00
2000.00
2500.00
Grep TeraGen Terasort
Time (sec)
GPFS rack=1
HDFS rack=1
GPFS rack=2
HDFS rack=2
IBM Research
Storage Systems Research © 2008 IBM Corporation
Impact on traditional workloads
0
0.5
1
1.5
2
2.5
512K 512K, metablock 16M
Normalized perf
Normalized Random perf Normalized Seq read perf
iDataPlex: 42 nodes
8 cores, 8GB RAM
4+1 disks per-node
GPFS: version pre3.3
Bonnie filesystem benchmark
IBM Research
Storage Systems Research © 2008 IBM Corporation
Things that didn’t work
� Large filesystem block-size
� Turn-off Prefetching
� Create alignment of records
to block boundaries 0
100
200
300
400
500
600
700
800
HDFS
GPFS
16M
GPFS
16ML
GPFS
16MLA
GPFS
16MLA
NP
File System
0
0.5
1
1.5
2
2.5
512K 16M 16M, no-prefetch
Time (sec)
Normalized Random perf Normalized Seq read perf
IBM Research
Storage Systems Research © 2008 IBM Corporation
Advantages of traditional filesystems
� Traditional filesystems have solved many hard problems like
access control, quotas, snapshots …
� Allow traditional and MapReduce applications to share the
same input data.
� Exploit Filesystem tools & scripts based on “regular”
filesystems.
� Re-use of Backup/Archive solutions built around particular
filesystems.
� Mixed analytics pipelines.
Load AnalyzeCrawl Output Serve
Crawl Analyze Serve
Using a MapReduce-specific filesystem (e.g. HDFS):
Using a general-purpose filesystem (e.g. GPFS):
Crawler writes to a traditional filesystem Into mapreduce filesystem Back to traditional fielsystem
IBM Research
Storage Systems Research © 2008 IBM Corporation
Conclusion
� MapReduce platforms can use traditional filesystems without
loss of performance.
� There are important reasons why traditional filesystems are
attractive to users of MapReduce.