Upload
anis-wells
View
215
Download
0
Tags:
Embed Size (px)
Citation preview
Interposed Request Routing for Scalable
Network Storage
Darrell Anderson, Jeff Chase, and Amin VahdatDepartment of Computer Science
Duke University
Duke University Department of Computer Science
Goals
Devise a highly scalable network storage architecture• Interpose on a standard file system protocol.
– Prototype supports NFS version 3.• Distribute responsibilities and data.
– Divide functions (e.g., data vs. metadata).– Scale functions by aggregating servers.
This talk:• Request routing to scale functions.
Duke University Department of Computer Science
In the Beginning...
NFS Client
NFS Server
Network
Client sends and receives standard NFS
packets.
Server sends and receives standard NFS
packets.
Duke University Department of Computer Science
Interposed Routing
NFS Client
*Server
Client sends and receives standard NFS
packets.
Slice µProxy intercepts and redirects NFS packets to
specialized servers.
µ
*Server
*Server
*Server
*Server
Duke University Department of Computer Science
Outline
Interposed routingSlice architecture
• Functional decomposition• Data decomposition
Functions• Block-I/O• Small-file• Metadata
Request routingPerformance
Duke University Department of Computer Science
Slice Architecture
file placement policy
network
storage array
small-file servers
directory servers
name space
requestsbulk I/O
small file read/writ
e
nameroutin
g
striping policyclient
µproxy
Duke University Department of Computer Science
Functional Decomposition
file placement policy
network
storage array
small-file servers
directory servers
name space
requestsbulk I/O
small file read/writ
e
nameroutin
g
striping policyclient
µproxy
Duke University Department of Computer Science
Data Decomposition
file placement policy
network
storage array
small-file servers
directory servers
name space
requestsbulk I/O
small file read/writ
e
nameroutin
g
striping policyclient
µproxy
Duke University Department of Computer Science
Outline
Interposed routingSlice architecture
• Functional decomposition• Data decomposition
Functions• Block-I/O Storage Nodes• Small-file Servers• Directory Servers
Request routingPerformance
Duke University Department of Computer Science
Block-I/O Storage Nodes
Network storage nodes provide all storage in Slice.• Prototype uses a simple object-based model.
– Read, write, remove, truncate.• Clients access storage nodes directly.
– Static striping, or flexible block-maps.– Optional RAID “10” mirrored striping.
network
storage array
bulk I/O
striping policyclient
µproxy
Duke University Department of Computer Science
Handle read and write operations on small files.• All I/O requests below threshold (e.g., 64 KB).
– Also the initial “small” segments of large files.
• Absorb and aggregate I/O on small files.– Data backed by storage array.
• Storage nodes need not handle small files well.
Small-File Servers
small-file servers
file placement policy
small file read/writ
e
client µprox
y
network
storage array
Duke University Department of Computer Science
Directory Servers
Handle name space operations.• Associate name with attributes (lookup,
getattr).• Manage directory contents (create, readdir).
– Preserve dependencies between objects.• Create affects new object and its parent
directory.
directory servers
name routing policy
name space
requests
client µprox
y
network
storage array
Duke University Department of Computer Science
Outline
Interposed routingSlice architecture
• Functional decomposition• Data decomposition
Functions• Block-I/O Storage Nodes• Small-file Servers• Directory Servers
Request routingPerformance
Duke University Department of Computer Science
Request Routing Goals
Focus on name space.• Spread name space across multiple servers.
– Balance capacity and load.• (Maybe) keep entries on same server as
parent.– Some name space ops involve multiple
sites.• Create entry, update parent modify time.
Duke University Department of Computer Science
Request Routing
Three policies for name space request routing:• Volume Partitioning:
– Divide the name space into volumes.– Volumes have well defined mount points.
• Mkdir Switching:– Items on same server as parent directory.– Some mkdirs redirect to another server.
• Name Hashing:– Name space is a distributed hash table.– Requests hash by name, parent dir.
Duke University Department of Computer Science
Outline
Interposed routingSlice architecture
• Functional decomposition• Data decomposition
Functions• Block-I/O Storage Nodes• Small-file Servers• Directory Servers
Request routingPerformance
Duke University Department of Computer Science
Experiment Configuration
Hardware• Client: 450 MHz P3 with 32 bit 33 MHz PCI.• Server: 733 MHz P3 with 64 bit 66 MHz PCI.• Server: 8x 18 GB Seagate Ultra-2 Cheetah disks.• Gigabit Ethernet with 9 KB “jumbo” frames.
Software• FreeBSD 4.0-release.• Modified NFS stack and firmware for zero-copy.• NFS uses UDP/IP with 32 KB MTU.• Slice kernel modules; µProxy is IP filter on client.
Duke University Department of Computer Science
Block-I/O Scaling
0
10
20
30
40
50
60
70
0 1 2 3 4 5 6 7 8 9
Storage Nodes
Sin
gle
-Client B
andw
idth
(M
B/s
)
readwritemirror-readmirror-write
0
100
200
300
400
500
0 1 2 3 4 5 6 7 8 9
Storage Nodes
Aggre
gate
Bandw
idth
(M
B/s
)
readwritemirror-readmirror-write
Duke University Department of Computer Science
Name Space Scaling
0
200
400
600
800
0 5 10 15 20 25
Clients
Ave
rage T
ime (s) N-UFS
Slice-1N-MFSSlice-2Slice-4Slice-8
Duke University Department of Computer Science
Mkdir Switching Affinity
0
50
100
150
200
250
300
0 20 40 60 80 100
Directory Affinity (%)
Ave
rage T
ime (s)
16 Clients8 Clients4 Clients1 Client
Duke University Department of Computer Science
SPECsfs97 Throughput
0
2000
4000
6000
8000
0 1250 2500 3750 5000 6250 7500
Offered Load (IOPS)
Delive
red L
oad (IO
PS)
Slice-8Slice-6Slice-4Slice-2Slice-1NFSIdeal
Duke University Department of Computer Science
SPECsfs97 Latency
0
5
10
15
0 1250 2500 3750 5000 6250 7500
Delivered Load (IOPS)
Ave
rage L
ate
ncy
(mse
c/o
p)
NFSSlice-1Slice-2Slice-4Slice-6Slice-8Celerra 506
Duke University Department of Computer Science
Summary
Slice interposes between NFS client and server.• Simple redirection of NFS version 3 packets.
– Slice µProxy inspects and rewrites packets.• Separates functions normally for central server.
– Functional decomposition for request stream.– Data decomposition to scale each function.
• Prototype shows performance and scalability.
http://www.cs.duke.edu/ari/slice
Duke University Department of Computer Science
EOF
Duke University Department of Computer Science
Handling Failures
Approach: write-ahead logging.• µProxy logs intentions for
“dangerous” operations to coordinator.– Also logs when finished.
• Coordinator completes or aborts aging operations.– Roll forward, or back.
• Independent of client, server, and storage nodes.
µ
Coordinator
NFS Client
4. Safe again2. Danger!
3. (do it)
1. Request5. Response