Click to edit Master title style Literature Review Interconnection Architectures for Petabye-Scale...

Preview:

DESCRIPTION

Proposal  Follow high-performance computing evolution –Multi-processor networks  Network of commodity devices  Use disk + 4  12port 1GbE switch as building block  Explore & simulate interconnect topologies

Citation preview

Click to edit Master title style Literature Review

Interconnection Architectures for Petabye-Scale High-Performance Storage Systems

Andy D. Hospodor, Ethan L. MillerIEEE/NASA Goddard Conference on Mass Storage Systems and Technologies

April 2004

Henry ChenSeptember 24, 2010

Introduction

High-performance storage systems– Petabytes (250 bytes) of data storage– Supply hundreds or thousands of compute nodes– Aggregate system bandwidth >100GB/s

Performance should scale with capacity

Large individual storage systems– Require high-speed network interface– Concentration reduces fault tolerance

Proposal

Follow high-performance computing evolution– Multi-processor networks

Network of commodity devices

Use disk + 412port 1GbE switch as building block

Explore & simulate interconnect topologies

Commodity Hardware

Network– 1Gb Ethernet: ~$20 per port– 10Gb Ethernet: ~$5000 per port (25x per Gb per port)

● Aside: Now ~$1000 per port

Disk drive– ATA/(SATA)– FibreChannel/SCSI/(SAS)

Setup

Target 100GB/s bandwidth

Build system using 250GB drives (2004)– 4096 drives to reach 1PB– Assume each drive has 25MB/s throughput

1Gb link supports 23 disks

10Gb link supports ~25 disks

Basic Interconnection

32 disks/switch

Replicate system 128x– 4096 1Gb ports– 128 10Gb ports

~Networked RAID0

Data local to each server

Fat Tree

4096 1Gb ports

2418 10Gb ports– 2048 switch to router

(128 Sw × 8 Rt × 2)– 112 inter-router– 256 server to router (×2)

Need large, multi-stage routers

~$10M for 10Gb ports

Butterfly Network

Need “concentrator” switch layer

Each network level carries entire traffic load

Only one path between any two server and storage

Mesh

Routers to servers atmesh edges

16384 1Gb links

Routers only atedges; mesh providespath redundancy

Torus

Mesh with edgeswrapped around

Reduces average pathlength

No edges; dedicatedconnection breakoutto servers

Hypercube

Special-case torus

Bandwidth scalesbetter than mesh/torus

Connections per nodeincreases with system

Can group devices intosmaller units and connectwith torus

Bandwidth Not all topologies actually capable of 100GB/s

Maximum simultaneous bandwidthLink speed × number of links

Average hops

Analysis

Embedding switches in storage fabric uses fewer high-speed ports, but more low-speed ports

Router Placement in Cube-Styles Routers require nearly 100% bandwidth of links

Adjacent routers cause overload & underload

Use random placement; optimization possible?

Conclusions

Build multiprocessor-style network for storage

Commodity-based storage fabrics can be used to improve reliability and performance; scalable

Rely on large number of lower-speed links; limited number of high-speed links where necessary

Higher-dimension torii (4-D, 5-D) provides reasonable solution for 100GB/s from 1PB

Recommended