Upload
philip-french
View
220
Download
0
Embed Size (px)
DESCRIPTION
Andrew Hanushevsky17-Mar-993 High Energy Physics Quantitative Challenge n ExperimentBaBar/SLACATLAS/CERN n StartsMay 1999May 2005 n Data Volume0.2 petabytes/yr5.0 petabytes/yr u Total amount 2.0 petabytes 100 petabytes n Aggregate xfr rate200 MB/sec disk100 GB/sec disk n 60 MB/sec tape 1 GB/sec tape n Processing power5,000 SPECint95250,000 SPECint95 u SPARC Ultra 10’s ,000 n Physicists8003,000 n Locations n Countries 9 50
Citation preview
Andrew Hanushevsky 17-Mar-99 1
Pursuit of a Scalable High Performance Multi-Petabyte Database
16th IEEE Symposium on Mass Storage SystemsAndrew Hanushevsky
SLAC Computing ServicesMarcin Nowak
CERNProduced under contract DE-AC03-76SF00515 between Stanford University and the Department of Energy
Andrew Hanushevsky 17-Mar-99 2
High Energy Experiments
BaBar at SLAC High precision investigation of B-meson decays Explore the asymmetry between matter and antimatter
Where did all the antimatter go? ATLAS at CERN
Probe the Higgs boson energy range Explore the more exotic reaches of physics
Andrew Hanushevsky 17-Mar-99 3
High Energy Physics Quantitative Challenge
Experiment BaBar/SLAC ATLAS/CERN Starts May 1999 May 2005 Data Volume 0.2 petabytes/yr 5.0 petabytes/yr
Total amount 2.0 petabytes 100 petabytes Aggregate xfr rate 200 MB/sec disk 100 GB/sec disk 60 MB/sec tape 1 GB/sec tape Processing power 5,000 SPECint95 250,000 SPECint95
SPARC Ultra 10’s 526 27,000 Physicists 800 3,000 Locations 87 250 Countries 9 50
Andrew Hanushevsky 17-Mar-99 4
Common Elements
Data will be stored in an Object Oriented database Objectivity/DB
Has theoretical ability to scale to size of experiments Most data will be kept offline
HPSS Heavy duty, industrial strength mass storage system
BaBar will be blazing the path First large scale experiment to use this combination The year of the hare will be a very interesting time
Andrew Hanushevsky 17-Mar-99 5
Objectivity/DB
Client/Server Application Primary access is through the Advanced Multithreaded Server (AMS)
Can have any number of AMS’ AMS serves “pages” (512 to 64K byte blocks)
Similar to other remote filesystem interfaces (e.g., NFS) Objectivity client can read and write database “pages” via AMS
Pages range from 512 bytes to 64K in powers of 2 (e.g., 1K, 2K, 4K, etc.)
ams protocolams protocol ufs protocolufs protocol
Andrew Hanushevsky 17-Mar-99 6
High Performance Storage System
#Bitfile Server#Name Server
#Storage Servers#Physical Volume Library
# Physical Volume Repositories#Storage System Manager#Migration/Purge Server
#Metadata Manager#Log Daemon#Log Client
#Startup Daemon#Encina/SFS
#DCE
ControlNetwork
Data Network
Andrew Hanushevsky 17-Mar-99 7
The Obvious Solution
DatabaseDatabaseServersServers
ComputeComputeFarmFarm
Mass StorageMass StorageSystemSystem
NetworkNetworkSwitchSwitch
External CollaboratorsExternal Collaborators
But… the devil is in the details
Andrew Hanushevsky 17-Mar-99 8
Capacity and Transfer Rate
1
2
4
8
16
64
128
1024
32
512
Tape Cartridge Capacity
88 90 94 98 02 04
Tape Transfer Rate
GB Capacity
Year009692 06
256
MB/Sec
3
6
12
24
48
96
192
384
Disk System Capacity
Disk Transfer Rate
Andrew Hanushevsky 17-Mar-99 9
The Capacity Transfer Rate Gap
Density growing faster than ability to transfer data We can store the data just fine, but do we have the time to look at it?
There are solutions short of poverty Stripped tape?
Only if you want a lot of headaches Intelligent staging
Primary access on RAID devices Cost/Performance is still a problem Need to address UFS scaling problem
Replication - a fatter pipe? Data synchronization problem Load balancing issues
Whatever the solution is, you’ll need lot of them
Andrew Hanushevsky 17-Mar-99 10
Part of the solution: Together Alone
HPSS Highly scalable, excellent I/O performance for large files but
High latency for small block transfers (i.e., Objectivity/DB) AMS
Efficient database protocol and highly flexible but Limited security, tied to local filesystem
Need to synergistically mate these systems
Andrew Hanushevsky 17-Mar-99 11
Opening up new vistas: The Extensible AMS
oofs interface
System specific interface
Andrew Hanushevsky 17-Mar-99 12
Veritas Volume Manager Catenates disk devices to form very large capacity logical devices
Veritas File System High performance (60+ MB/Sec) journaled file system for fast recovery
Combination used as HPSS staging target Allows for fast streaming I/O and efficient small block transfers
As big as it gets: Scaling The File System
Andrew Hanushevsky 17-Mar-99 13
Not out of the woods yet: Other Issues
Access Patterns Random vs sequential
Staging latency Scalability Security
Andrew Hanushevsky 17-Mar-99 14
No prophets here: Supplying Performance Hints
Need additional information for optimum performance Different from Objectivity clustering hints
Database clustering Processing mode (sequential/random) Desired service levels
Information is Objectivity independent Need a mechanism to tunnel opaque information
Client supplies hints via oofs_set_info() call Information relayed to AMS in a transparent way AMS relays information to underlying file system via oofs()
Andrew Hanushevsky 17-Mar-99 15
Where’s the data? Dealing With Latency...
Hierarchical filesystems may have high latency bursts Mounting a tape file
Need mechanism to notify client of expected delay Prevents request timeout Prevents retransmission storms Also allows server to degrade gracefully
Can delay clients when overloaded Defer Request Protocol
Certain oofs() requests can tell client of expected delay For example, open()
Client waits indicated amount of time and tries again
Andrew Hanushevsky 17-Mar-99 16
Many out of one: Dynamically Replicated Databases
Dynamically distributed databases Single machine can’t manage over a terabyte of disk cache No good way to statically partition the database
Dynamically varying database access paths As load increases, add more copies
Copies accessed in parallel As load decreases, remove copies to free up disk space
Objectivity catalog independence Copies managed outside of Objectivity
Minimizes impact on administration
Andrew Hanushevsky 17-Mar-99 17
If There are many, which One Do I Go To?
Request Redirect Protocol oofs () routines supply alternate AMS location
oofs routines responsible for update synchronization Typically, read/only access provided on copies Only one read/write copy conveniently supported
Client must declare intention to update prior to access Lazy synchronization possible
Good mechanism for largely read/only databases Load balancing provided by an AMS collective
Has one distinguished member recorded in the catalogue
Andrew Hanushevsky 17-Mar-99 18
The AMS Collective
redirect
Collective members areeffectively interchangeable
DistinguishedMembers
Andrew Hanushevsky 17-Mar-99 19
Keeping the hackers at bay: Object Oriented Security
No performance is sufficient if you have to always recompute Need mechanism to provide security to thwart hackers
Protocol Independent Authentication Model Public or private key
PGP, RSA, Kerberos, etc.• Can be negotiated at run-time
Automatically called by client and server kernels Supplied via replaceable shared libraries
Client Objectivity Kernel creates security objects as needed Security objects supply context-sensitive authentication credentials
Works only with Extensible AMS via oofs interface
Andrew Hanushevsky 17-Mar-99 20
Overall Effects
Extensible AMS Allows use of any type of filesystem via oofs layer
Generic Authentication Protocol Allows proper client identification
Opaque Information Protocol Allows passing of hints to improve filesystem performance
Defer Request Protocol Accommodates hierarchical filesystems
Redirection Protocol Accommodates terabyte+ filesystems Provides for dynamic load balancing
Andrew Hanushevsky 17-Mar-99 21
Dynamic Load Balancing Hierarchical Secure AMS
DynamicSelection
Andrew Hanushevsky 17-Mar-99 22
Summary
AMS is capable of high performance Ultimate performance limited by disk speeds
Should be able to deliver average of 20 MB/Sec per disk The oofs interface + other protocols greatly enhance
performance, scalability, usability, and security 5+TB of SLAC data has been processed using AMS+HPSS
Some AMS problems No HPSS problems
SLAC will be using this combination to store physics data BaBar experiment will produce over a 2 PB database in 10 years
2,000,000,000,000,000 = 21015 bytes 200,000 3590 Tapes
Andrew Hanushevsky 17-Mar-99 23
Now for the reality
Full AMS features not yet implemented SLAC/Objectivity design has been completed
oofs OO interface, OO security, protocols (I.e., DRP, RRP, and GAP) oofs and ooss layers are completely functional
HPSS integration is full-featured and complete Protocol development has been fully funded at SLAC
DRP, RRP, and GAP Initial feature set to be deployed late summer
DRP, GAP, and limited RRP Full asynchronous replication within 2 years
CERN & SLAC approaches similar But quite different in detail….
Andrew Hanushevsky 17-Mar-99 24
CERN staging approach: RFIO/RFCP + HPSS
File & catalogmanagement
Stage-inrequests
UNIX FS I/ODB pages
AMSRFIO
daemon
HPSSServer
DiskPool
RFIO calls
Migrationdaemon
RFCP(RFIO copy)
DiskServer(Solaris)
HPSS MoverTapeRobot
Andrew Hanushevsky 17-Mar-99 25
SLAC staging approach: PFTP + HPSS
File & catalogmanagement
Stage-inrequests
UNIX FS I/ODB pages
AMSGatewaydaemon
HPSSServer
DiskPool
Gateway Requests
Migrationdaemon
PFTP(data)
DiskServer(Solaris)
HPSS MoverTapeRobot
PFTP(control)
Andrew Hanushevsky 17-Mar-99 26
SLAC ultimate approach: Direct Tape Access
File & catalogmanagement
Stage-inrequests
UNIX FS I/ODB pages
AMSHPSS
Server
DiskPool
Migrationdaemon
Direct Transfer
DiskServer(Solaris)
HPSS Mover
TapeRobot
Native API (rpc)
Andrew Hanushevsky 17-Mar-99 27
CERN 1TB Test Bed
FDDIHIPPI
Fast Ethernet
DEC Alpha
IBMRS6000
IBMRS6000
SUN Sparc 5
SUN Sparc 5IBM Tape Silo
StagingPool
HPSS DataMover
HPSS DataMover
HPSS ServerRFIO daemon
AMS/HPSSInterface
current approximation
future1Gb switched ether
star topology
Andrew Hanushevsky 17-Mar-99 28
SLAC Configuration
B
GigabitEthernet
Sun 4500
AMS ServerHPSS Mover
HPSS Server
IBM RS6000F50
Sun 4500
AMS ServerHPSS Mover
AMS ServerHPSS Mover
900 G
Sun 4500
AMS ServerHPSS Mover
Sun 4500
AMS ServerHPSS Mover
approximate
Andrew Hanushevsky 17-Mar-99 29
SLAC Detailed Configuration