9c960529620ba0ca69

Datacenter Storage Architecture for MapReduce Applications

Jeffrey Shafer, Scott Rixner, Alan L. Cox

Rice University{shafer, rixner, alc}@rice.edu

Abstract

Data-intensive computing systems running MapReduce-style applications are currently architected with storage lo-cal to computation in the same physical box. This posterargues that upcoming advances in converged datacenter net-works will allow MapReduce applications to utilize and ben-efit from network-attached storage. This is made possibleby properties of all MapReduce-style applications, such asstreaming storage access patterns. By decoupling compu-tation and storage, stateless compute nodes containing in-expensive, low-power processors can be deployed in largenumbers to increase application performance, improve re-liability, and decrease power consumption.

1 Introduction

Data-intensive problems exist today in the domains ofcommerce, science, and engineering. These problemsare solved only by writing applications that leverage thepower of thousands of processors to manipulate data setsranging in size from hundreds of terabytes to dozens ofpetabytes. Such applications can only be run in the data-center, where they can take advantage of computation,storage, power, and cooling resources on a large scale [2].

A new programming model – calledMapReduce – hasemerged to support these data-intensive applications. Pro-grams are divided into amap function and areduce func-tion. The map function takes a key/value pair and pro-duces one or more intermediate key/value pairs. The re-duce function takes these intermediate pairs and mergesall values corresponding to a single key. Both of thesestages can run independently on each key/value pair, ex-posing enormous amounts of parallelism. In a MapRe-duce application, a middleware layer (such as Yahoo’sHadoop) is responsible for distributing the parallelizedcomputation to a large pool of loosely-synchronized pro-cessors and aggregating the results.

MapReduce-style applications use a different storagearchitecture than traditional datacenter applications suchas transactional processing, which commonly use a cen-

tralized data store accessed across the network. A keypremise for MapReduce-style computing systems is thatthere is insufficient network bandwidth to move the datato the computation, and thus computation must move tothe data instead [2]. Based on the assumption that remotedata can only be accessed with low bandwidth and highlatency, current MapReduce architectures co-locate com-putation and storage in the same physical box, as shownin Figure 1. Storage is accessible across the network viaa global file system that provides replication and load bal-ancing. However, the MapReduce framework attempts tomigrate computation to use data on locally-attached diskswhenever possible.

Upcoming advances in converged network fabrics willenable a more flexible storage architecture for MapRe-duce applications without any loss of efficiency. Con-verged network technologies have previously focused onallowing applications to access existing storage-area net-works without either transiting a front-end storage serveror using a dedicated storage network adaptor. In contrast,we advocate using such converged networks to decou-ple computation and storage for MapReduce. The net-work properties provided by quality-of-service and loss-less flow control protocols will allow disks to be removedfrom the compute nodes and instead attached to switchesthroughout the datacenter. No dedicated storage-area net-work will be required. Properties of MapReduce applica-tions will allow this decoupling of computation from stor-age while maintaining equivalent performance, enabling amore flexible and efficient datacenter design.

2 Datacenter Architecture

We advocate a new storage architecture for MapReduce inthe datacenter that incorporates remote storage, as shownin Figure 2 for a single rack. This design employs net-worked disks – simple unstructured block storage devicesconnected to the network [3] – along with the use of aconverged network to decouple storage resources fromcomputation resources. Compute nodes are equipped with

1

processors but not disks, and storage nodes are equippedwith disks but are not used for computation. Instead ofco-locating storage and computation in the same box as inthe traditional MapReduce storage architecture, this de-sign co-locates storage and computation on the same Eth-ernet switch.

The advantages of using remote disks in a datacenterare numerous. First, both computation and storage re-sources can be adjusted to meet application requirementsduring both cluster construction and operation. Second,computation and storage failures are decoupled from eachother. This eliminates wasted resources that would havebeen used to reconstruct lost storage when a computationnode fails. Third, fine-grained power management tech-niques can be used, whereby compute and storage nodesare enabled and disabled to meet current application re-quirements. Finally, because computation is now an in-dependent resource, a mix of both high and low powerprocessors can be deployed. The runtime environmentmanaging application execution can change the proces-sors being used for a specific application in order to meetadministrative power and performance goals.

3 MapReduce and Remote Storage

MapReduce frameworks such as Hadoop make an explicitassumption that storage is local to computation. However,a fresh look at the architecture reveals that these resourcescan be decoupled and connected merely to the same net-work switch. This is made possible for two major reasons:

First, MapReduce applications use storage in a mannerthat is different than ordinary applications. Data is typi-cally stored in large blocks (e.g. 64MB) and accessed us-ing a simple write-once, read-many coherence model [1].Application performance depends more on the bandwidthto access entire blocks than the latency to access any par-ticular element in a block. Furthermore, data is accessedin a streaming pattern, rather than random access. Thisallows storage requests to be pipelined and data to bestreamed across the network, minimizing any effect fromnetwork latency.

Second, modern network switches offer extremely highperformance. A typical 48- or 96-port Ethernet switchprovides the full bisection bandwidth across its switchingfabric, allowing an entire rack of hosts to communicatewith each other at full network speed. If more bandwidthis needed, link aggregation can be used to connect multi-ple Gigabit links to the same host. Modern switches alsohave low latency – under 2us for a minimum-sized Eth-ernet frame – and emerging cut-through fabrics promiseto reduce latency further. Compared to hard disk seek la-

tencies measured in milliseconds, the forwarding latencyof modern switches is negligible1. Thus, high perfor-mance switches add minimal overhead to remote storageaccesses that are located in the same rack as the compu-tation node. By combining modern network switches anda streaming access pattern, remote disks can potentiallyachieve throughputs approaching that of local disks.

We have tested this proposed architecture on a smallscale using a cluster computer running the HadoopMapReduce framework. This cluster was configured intwo ways: First, using disks installed in the computenodes, and second, moving those same disks to other non-compute nodes attached to the same Ethernet switch. Inthe second case, the lightweight ATA-over-Ethernet pro-tocol was used to export each disk across the network asa low-level block storage device with minimal overhead.Preliminary experiments conducted using data-intensivebenchmarks such as a random data generator showed thatthe decoupled storage architecture had equivalent perfor-mance to the local storage architecture.

4 Conclusions

We propose a new architecture for MapReduce applica-tions that decouples computation from storage by using aconverged network and networked disks. Removing thedisks provides an opportunity for more flexible and effi-cient datacenter design. The ratio between computationand storage resources can be changed both during data-center construction and dynamically during operation tomeet application performance and power requirements.Clusters can also be equipped with a mix of computa-tion nodes containing either high-performance (i.e., IntelXeon) or high-efficiency (i.e., Intel Atom) processors, al-lowing applications to select the performance and powermix that is desired.

References[1] HDFS (hadoop distributed file system) architecture.

http://hadoop.apache.org/core/docs/current/hdfs_design.html, 2008.

[2] R. E. Bryant. Data-intensive supercomputing: The case forDISC. Technical Report CMU-CS-07-128, Carnegie Mel-lon University, May 2007.

[3] Gibson et al. A cost-effective, high-bandwidth storage ar-chitecture. InASPLOS-VIII, 1998.

1Even solid-state disk latencies are much higher than networkswitching latencies. Regardless, conventional disks are still the likelychoice for MapReduce datacenter storage due to capacity andcost.

2

5 Appendix

Figure 1: Local-Storage Architecture (Single Rack)

Figure 1 shows the storage architecture for a traditional datacenter running a MapReduce framework such asHadoop. In this architecture, data is stored on disks local to the processors. Although the data is accessible overthe network, computation is migrated to the data whenever possible.

Figure 2: Remote-Storage Architecture (Single Rack)

Figure 2 shows the proposed remote storage architecture. Inthis datacenter, storage is provided by network-attacheddisks connected to the same network switch as the compute nodes.

3

Documents

9c960529620ba0ca69