25
Interposed Request Routing for Scalable Network Storage Darrell Anderson, Jeff Chase, and Amin Vahdat Department of Computer Science Duke University

Interposed Request Routing for Scalable Network Storage Darrell Anderson, Jeff Chase, and Amin Vahdat Department of Computer Science Duke University

Embed Size (px)

Citation preview

Page 1: Interposed Request Routing for Scalable Network Storage Darrell Anderson, Jeff Chase, and Amin Vahdat Department of Computer Science Duke University

Interposed Request Routing for Scalable

Network Storage

Darrell Anderson, Jeff Chase, and Amin VahdatDepartment of Computer Science

Duke University

Page 2: Interposed Request Routing for Scalable Network Storage Darrell Anderson, Jeff Chase, and Amin Vahdat Department of Computer Science Duke University

Duke University Department of Computer Science

Goals

Devise a highly scalable network storage architecture• Interpose on a standard file system protocol.

– Prototype supports NFS version 3.• Distribute responsibilities and data.

– Divide functions (e.g., data vs. metadata).– Scale functions by aggregating servers.

This talk:• Request routing to scale functions.

Page 3: Interposed Request Routing for Scalable Network Storage Darrell Anderson, Jeff Chase, and Amin Vahdat Department of Computer Science Duke University

Duke University Department of Computer Science

In the Beginning...

NFS Client

NFS Server

Network

Client sends and receives standard NFS

packets.

Server sends and receives standard NFS

packets.

Page 4: Interposed Request Routing for Scalable Network Storage Darrell Anderson, Jeff Chase, and Amin Vahdat Department of Computer Science Duke University

Duke University Department of Computer Science

Interposed Routing

NFS Client

*Server

Client sends and receives standard NFS

packets.

Slice µProxy intercepts and redirects NFS packets to

specialized servers.

µ

*Server

*Server

*Server

*Server

Page 5: Interposed Request Routing for Scalable Network Storage Darrell Anderson, Jeff Chase, and Amin Vahdat Department of Computer Science Duke University

Duke University Department of Computer Science

Outline

Interposed routingSlice architecture

• Functional decomposition• Data decomposition

Functions• Block-I/O• Small-file• Metadata

Request routingPerformance

Page 6: Interposed Request Routing for Scalable Network Storage Darrell Anderson, Jeff Chase, and Amin Vahdat Department of Computer Science Duke University

Duke University Department of Computer Science

Slice Architecture

file placement policy

network

storage array

small-file servers

directory servers

name space

requestsbulk I/O

small file read/writ

e

nameroutin

g

striping policyclient

µproxy

Page 7: Interposed Request Routing for Scalable Network Storage Darrell Anderson, Jeff Chase, and Amin Vahdat Department of Computer Science Duke University

Duke University Department of Computer Science

Functional Decomposition

file placement policy

network

storage array

small-file servers

directory servers

name space

requestsbulk I/O

small file read/writ

e

nameroutin

g

striping policyclient

µproxy

Page 8: Interposed Request Routing for Scalable Network Storage Darrell Anderson, Jeff Chase, and Amin Vahdat Department of Computer Science Duke University

Duke University Department of Computer Science

Data Decomposition

file placement policy

network

storage array

small-file servers

directory servers

name space

requestsbulk I/O

small file read/writ

e

nameroutin

g

striping policyclient

µproxy

Page 9: Interposed Request Routing for Scalable Network Storage Darrell Anderson, Jeff Chase, and Amin Vahdat Department of Computer Science Duke University

Duke University Department of Computer Science

Outline

Interposed routingSlice architecture

• Functional decomposition• Data decomposition

Functions• Block-I/O Storage Nodes• Small-file Servers• Directory Servers

Request routingPerformance

Page 10: Interposed Request Routing for Scalable Network Storage Darrell Anderson, Jeff Chase, and Amin Vahdat Department of Computer Science Duke University

Duke University Department of Computer Science

Block-I/O Storage Nodes

Network storage nodes provide all storage in Slice.• Prototype uses a simple object-based model.

– Read, write, remove, truncate.• Clients access storage nodes directly.

– Static striping, or flexible block-maps.– Optional RAID “10” mirrored striping.

network

storage array

bulk I/O

striping policyclient

µproxy

Page 11: Interposed Request Routing for Scalable Network Storage Darrell Anderson, Jeff Chase, and Amin Vahdat Department of Computer Science Duke University

Duke University Department of Computer Science

Handle read and write operations on small files.• All I/O requests below threshold (e.g., 64 KB).

– Also the initial “small” segments of large files.

• Absorb and aggregate I/O on small files.– Data backed by storage array.

• Storage nodes need not handle small files well.

Small-File Servers

small-file servers

file placement policy

small file read/writ

e

client µprox

y

network

storage array

Page 12: Interposed Request Routing for Scalable Network Storage Darrell Anderson, Jeff Chase, and Amin Vahdat Department of Computer Science Duke University

Duke University Department of Computer Science

Directory Servers

Handle name space operations.• Associate name with attributes (lookup,

getattr).• Manage directory contents (create, readdir).

– Preserve dependencies between objects.• Create affects new object and its parent

directory.

directory servers

name routing policy

name space

requests

client µprox

y

network

storage array

Page 13: Interposed Request Routing for Scalable Network Storage Darrell Anderson, Jeff Chase, and Amin Vahdat Department of Computer Science Duke University

Duke University Department of Computer Science

Outline

Interposed routingSlice architecture

• Functional decomposition• Data decomposition

Functions• Block-I/O Storage Nodes• Small-file Servers• Directory Servers

Request routingPerformance

Page 14: Interposed Request Routing for Scalable Network Storage Darrell Anderson, Jeff Chase, and Amin Vahdat Department of Computer Science Duke University

Duke University Department of Computer Science

Request Routing Goals

Focus on name space.• Spread name space across multiple servers.

– Balance capacity and load.• (Maybe) keep entries on same server as

parent.– Some name space ops involve multiple

sites.• Create entry, update parent modify time.

Page 15: Interposed Request Routing for Scalable Network Storage Darrell Anderson, Jeff Chase, and Amin Vahdat Department of Computer Science Duke University

Duke University Department of Computer Science

Request Routing

Three policies for name space request routing:• Volume Partitioning:

– Divide the name space into volumes.– Volumes have well defined mount points.

• Mkdir Switching:– Items on same server as parent directory.– Some mkdirs redirect to another server.

• Name Hashing:– Name space is a distributed hash table.– Requests hash by name, parent dir.

Page 16: Interposed Request Routing for Scalable Network Storage Darrell Anderson, Jeff Chase, and Amin Vahdat Department of Computer Science Duke University

Duke University Department of Computer Science

Outline

Interposed routingSlice architecture

• Functional decomposition• Data decomposition

Functions• Block-I/O Storage Nodes• Small-file Servers• Directory Servers

Request routingPerformance

Page 17: Interposed Request Routing for Scalable Network Storage Darrell Anderson, Jeff Chase, and Amin Vahdat Department of Computer Science Duke University

Duke University Department of Computer Science

Experiment Configuration

Hardware• Client: 450 MHz P3 with 32 bit 33 MHz PCI.• Server: 733 MHz P3 with 64 bit 66 MHz PCI.• Server: 8x 18 GB Seagate Ultra-2 Cheetah disks.• Gigabit Ethernet with 9 KB “jumbo” frames.

Software• FreeBSD 4.0-release.• Modified NFS stack and firmware for zero-copy.• NFS uses UDP/IP with 32 KB MTU.• Slice kernel modules; µProxy is IP filter on client.

Page 18: Interposed Request Routing for Scalable Network Storage Darrell Anderson, Jeff Chase, and Amin Vahdat Department of Computer Science Duke University

Duke University Department of Computer Science

Block-I/O Scaling

0

10

20

30

40

50

60

70

0 1 2 3 4 5 6 7 8 9

Storage Nodes

Sin

gle

-Client B

andw

idth

(M

B/s

)

readwritemirror-readmirror-write

0

100

200

300

400

500

0 1 2 3 4 5 6 7 8 9

Storage Nodes

Aggre

gate

Bandw

idth

(M

B/s

)

readwritemirror-readmirror-write

Page 19: Interposed Request Routing for Scalable Network Storage Darrell Anderson, Jeff Chase, and Amin Vahdat Department of Computer Science Duke University

Duke University Department of Computer Science

Name Space Scaling

0

200

400

600

800

0 5 10 15 20 25

Clients

Ave

rage T

ime (s) N-UFS

Slice-1N-MFSSlice-2Slice-4Slice-8

Page 20: Interposed Request Routing for Scalable Network Storage Darrell Anderson, Jeff Chase, and Amin Vahdat Department of Computer Science Duke University

Duke University Department of Computer Science

Mkdir Switching Affinity

0

50

100

150

200

250

300

0 20 40 60 80 100

Directory Affinity (%)

Ave

rage T

ime (s)

16 Clients8 Clients4 Clients1 Client

Page 21: Interposed Request Routing for Scalable Network Storage Darrell Anderson, Jeff Chase, and Amin Vahdat Department of Computer Science Duke University

Duke University Department of Computer Science

SPECsfs97 Throughput

0

2000

4000

6000

8000

0 1250 2500 3750 5000 6250 7500

Offered Load (IOPS)

Delive

red L

oad (IO

PS)

Slice-8Slice-6Slice-4Slice-2Slice-1NFSIdeal

Page 22: Interposed Request Routing for Scalable Network Storage Darrell Anderson, Jeff Chase, and Amin Vahdat Department of Computer Science Duke University

Duke University Department of Computer Science

SPECsfs97 Latency

0

5

10

15

0 1250 2500 3750 5000 6250 7500

Delivered Load (IOPS)

Ave

rage L

ate

ncy

(mse

c/o

p)

NFSSlice-1Slice-2Slice-4Slice-6Slice-8Celerra 506

Page 23: Interposed Request Routing for Scalable Network Storage Darrell Anderson, Jeff Chase, and Amin Vahdat Department of Computer Science Duke University

Duke University Department of Computer Science

Summary

Slice interposes between NFS client and server.• Simple redirection of NFS version 3 packets.

– Slice µProxy inspects and rewrites packets.• Separates functions normally for central server.

– Functional decomposition for request stream.– Data decomposition to scale each function.

• Prototype shows performance and scalability.

http://www.cs.duke.edu/ari/slice

Page 24: Interposed Request Routing for Scalable Network Storage Darrell Anderson, Jeff Chase, and Amin Vahdat Department of Computer Science Duke University

Duke University Department of Computer Science

EOF

Page 25: Interposed Request Routing for Scalable Network Storage Darrell Anderson, Jeff Chase, and Amin Vahdat Department of Computer Science Duke University

Duke University Department of Computer Science

Handling Failures

Approach: write-ahead logging.• µProxy logs intentions for

“dangerous” operations to coordinator.– Also logs when finished.

• Coordinator completes or aborts aging operations.– Roll forward, or back.

• Independent of client, server, and storage nodes.

µ

Coordinator

NFS Client

4. Safe again2. Danger!

3. (do it)

1. Request5. Response