View
214
Download
0
Category
Preview:
Citation preview
EnablingEnablingData-Intensive ScienceData-Intensive Science
with Tactical Storage Systemswith Tactical Storage Systems
Prof. Douglas ThainProf. Douglas Thain
University of Notre DameUniversity of Notre Dame
http://www.cse.nd.edu/~dthainhttp://www.cse.nd.edu/~dthain
The Cooperative Computing LabThe Cooperative Computing Lab
Our model of computer science research:Our model of computer science research:– UnderstandUnderstand how users with complex, large-scale how users with complex, large-scale
applications need to interact with computing systems.applications need to interact with computing systems.– DesignDesign novel computing systems that can be applied novel computing systems that can be applied
by many different users == basic CS research.by many different users == basic CS research.– DeployDeploy code in real systems with real users, suffer code in real systems with real users, suffer
real bugs, and learn real lessons == applied CS.real bugs, and learn real lessons == applied CS.
Application Areas:Application Areas:– Astronomy, Bioinformatics, Biometrics, Molecular Astronomy, Bioinformatics, Biometrics, Molecular
Dynamics, Physics, Game Theory, ... ???Dynamics, Physics, Game Theory, ... ???
External Support: NSF, IBM, SunExternal Support: NSF, IBM, Sun
http://www.cse.nd.edu/~cclhttp://www.cse.nd.edu/~ccl
AbstractAbstractUsers of distributed systems encounter many Users of distributed systems encounter many practical barriers between their jobs and the data practical barriers between their jobs and the data they wish to access.they wish to access.
Problem: Users have access to many Problem: Users have access to many resourcesresources (disks), but are stuck with the (disks), but are stuck with the abstractionsabstractions (cluster NFS) provided by administrators.(cluster NFS) provided by administrators.
Solution: Tactical Storage Systems allow any Solution: Tactical Storage Systems allow any user to create, reconfigure, and tear down user to create, reconfigure, and tear down abstractions without bugging the administrator.abstractions without bugging the administrator.
Transparent Distributed Filesystemshared
disk
The Standard ModelThe Standard Model
The Standard ModelThe Standard Model
Transparent Distributed Filesystemshared
disk
Transparent Distributed Filesystemshared
disk
privatedisk
privatedisk
privatedisk
privatedisk
FTP, SCP, RSYNC, HTTP, ...
Problems with the Standard ModelProblems with the Standard Model
Users encounter partitions in the WAN.Users encounter partitions in the WAN.– Easy to access data inside cluster, hard outside.Easy to access data inside cluster, hard outside.– Must use different mechanisms on diff links.Must use different mechanisms on diff links.– Difficult to combine resources together.Difficult to combine resources together.
Different access modes for different purposes.Different access modes for different purposes.– File transfer: preparing system for intended use.File transfer: preparing system for intended use.– File system: access to data for running jobs.File system: access to data for running jobs.
Resources go unused.Resources go unused.– Disks on each node of a cluster.Disks on each node of a cluster.– Unorganized resources in a department/lab.Unorganized resources in a department/lab.
A global file system can’t satisfy everyone!A global file system can’t satisfy everyone!
What if...What if...
Users could easily access any storage? Users could easily access any storage?
I could borrow an unused disk for NFS?I could borrow an unused disk for NFS?
An entire cluster can be used as storage?An entire cluster can be used as storage?
Multiple clusters could be combined?Multiple clusters could be combined?
I could reconfigure structures without root?I could reconfigure structures without root?– (Or bugging the administrator daily.)(Or bugging the administrator daily.)
Solution: Tactical Storage System (TSS)Solution: Tactical Storage System (TSS)
OutlineOutline
Problems with the Standard ModelProblems with the Standard ModelTactical Storage SystemsTactical Storage Systems– File Servers, Catalogs, Abstractions, AdaptersFile Servers, Catalogs, Abstractions, Adapters
Applications:Applications:– Remote Database Access for BaBar CodeRemote Database Access for BaBar Code– Remote Dynamic Linking for CDF CodeRemote Dynamic Linking for CDF Code– Logical Data Access for Bioinformatics CodeLogical Data Access for Bioinformatics Code– Expandable Database for MD SimulationExpandable Database for MD Simulation
Improving the OS for Grid ComputingImproving the OS for Grid Computing
Tactical Storage Systems (TSS)Tactical Storage Systems (TSS)
A TSS allows any node to serve as a file A TSS allows any node to serve as a file server or as a file system client.server or as a file system client.All components can be deployed without All components can be deployed without special privileges – but with security.special privileges – but with security.Users can build up complex structures.Users can build up complex structures.– Filesystems, databases, caches, ...Filesystems, databases, caches, ...
Two Independent Concepts:Two Independent Concepts:– ResourcesResources – The raw storage to be used. – The raw storage to be used.– AbstractionsAbstractions – The organization of storage. – The organization of storage.
file transfer
filesystem
filesystem
filesystem
filesystem
filesystem
filesystem
filesystem
CentralFilesystem
App
Distributed Database Abstraction
Adapter
App
Distributed Filesystem Abstraction
Adapter
App
Cluster administrator controlspolicy on all storage in cluster
UNIX UNIX UNIX UNIX UNIX UNIX UNIX
Workstations owners controlpolicy on each machine.
fileserver
fileserver
fileserver
fileserver
fileserver
fileserver
fileserver
UNIX UNIX UNIX UNIX UNIX UNIX UNIX
???Adapter
3PT
Components of a TSS:Components of a TSS:
1 – File Servers1 – File Servers
2 – Catalogs2 – Catalogs
3 – Abstractions3 – Abstractions
4 – Adapters4 – Adapters
1 – File Servers1 – File ServersUnix-Like InterfaceUnix-Like Interface– open/close/read/writeopen/close/read/write– getfile/putfile to stream whole filesgetfile/putfile to stream whole files– opendir/stat/rename/unlinkopendir/stat/rename/unlink
Complete IndependenceComplete Independence– choose friendschoose friends– limit bandwidth/spacelimit bandwidth/space– evict users?evict users?
Trivial to DeployTrivial to Deploy– run server + setaclrun server + setacl– no privilege requiredno privilege required– can be thrown into a grid systemcan be thrown into a grid system
Flexible Access ControlFlexible Access Control
fileserver
A
fileserver
B
ChirpProtocol
filesystemowner of
server Aowner ofserver B
Related WorkRelated Work
Lots of file services for the Grid:Lots of file services for the Grid:– GridFTP, NeST, SRB, RFIO, SRM, IBP, ...GridFTP, NeST, SRB, RFIO, SRM, IBP, ...– (Adapter interfaces with many of these!)(Adapter interfaces with many of these!)
Why have Why have anotheranother file server? file server?– Reason 1: Must have precise Unix semantics!Reason 1: Must have precise Unix semantics!
Apps distinguish ENOENT vs EACCES vs EISDIR.Apps distinguish ENOENT vs EACCES vs EISDIR.FTP always returns error 550, regardless of error.FTP always returns error 550, regardless of error.
– Reason 2: TSS focused on easy deployment.Reason 2: TSS focused on easy deployment.No privilege required, no config files, no rebuilding, No privilege required, no config files, no rebuilding, flexible access control, ...flexible access control, ...
Access Control in File ServersAccess Control in File Servers
Unix Security is not SufficientUnix Security is not Sufficient– No global user database possible/desirable.No global user database possible/desirable.– Mapping external credentials to Unix gets messy.Mapping external credentials to Unix gets messy.
Instead, Make External Names First-ClassInstead, Make External Names First-Class– Perform access control on remote, not local, names.Perform access control on remote, not local, names.– Types: Globus, Kerberos, Unix, Hostname, AddressTypes: Globus, Kerberos, Unix, Hostname, Address
Each directory has an ACL:Each directory has an ACL:globus:/O=NotreDame/CN=DThain RWLAglobus:/O=NotreDame/CN=DThain RWLA
kerberos:dthain@nd.edu RWLkerberos:dthain@nd.edu RWL
hostname:*.cs.nd.edu RLhostname:*.cs.nd.edu RL
address:192.168.1.* RWLAaddress:192.168.1.* RWLA
Problem: Shared NamespaceProblem: Shared Namespacefile
server
globus:/O=NotreDame/* RWLAX
a.out
test.c test.dat
cms.exe
Solution: Reservation (V) RightSolution: Reservation (V) Rightfile
server
O=NotreDame/CN=* V(RWLA)
/O=NotreDame/CN=Monk RWLA
mkdir
a.outtest.c
/O=NotreDame/CN=Monk
mkdir
/O=NotreDame/CN=Ted RWLA
a.outtest.c
/O=NotreDame/CN=Tedmkdir only!
2 - Catalogs2 - Catalogs
catalogserver
catalogserver
periodicUDP updates
HTTPXML, TXT, ClassAds
3 - Abstractions3 - Abstractions
An abstraction is an organizational layer built on An abstraction is an organizational layer built on top of one or more file servers.top of one or more file servers.
End UsersEnd Users choose what abstractions to employ. choose what abstractions to employ.
Working Examples:Working Examples:– CFS: Central File SystemCFS: Central File System– DSFS: Distributed Shared File SystemDSFS: Distributed Shared File System– DSDB: Distributed Shared DatabaseDSDB: Distributed Shared Database
Others Possible?Others Possible?– Distributed Backup SystemDistributed Backup System– Striped File System (RAID/Zebra)Striped File System (RAID/Zebra)
CFS: Central File SystemCFS: Central File System
fileserver
adapteradapter adapter
appl appl appl
file file
file
CFSCFSCFS
ptr ptr
ptr
DSFS: Dist. Shared File SystemDSFS: Dist. Shared File System
fileserver
appl appl
fileserver
fileserver
file file
filefilefile
file filefile
filefile
adapter adapterDSFSDSFS
lookupfile
location
accessdata
DSDB: Dist. Shared DatabaseDSDB: Dist. Shared Database
adapter adapter
appl appl
fileserver
fileserver
file file
filefilefile
file filefile
filefile
databaseserver
file index
query
directaccess
insert
create
file
DSDBDSDB
system callstrapped via ptrace
tcsh
cat vi
tcsh
cat vi
file tableprocess table
Like an OS KernelLike an OS Kernel– Tracks procs, files, etc.Tracks procs, files, etc.– Adds new capabilities.Adds new capabilities.– Enforces owner’s policies.Enforces owner’s policies.
Delegated SyscallsDelegated Syscalls– Trapped via ptrace interface.Trapped via ptrace interface.– Action taken by Parrot.Action taken by Parrot.– Resources chrgd to Parrot.Resources chrgd to Parrot.
User Chooses Abstr.User Chooses Abstr.– Appears as a filesystem.Appears as a filesystem.– Option: Timeout tolerance.Option: Timeout tolerance.– Option: Cons. semantics.Option: Cons. semantics.– Option: Servers to use.Option: Servers to use.– Option: Auth mechanisms.Option: Auth mechanisms.
4 - Adapter4 - Adapter
Adapter - Parrot
Abstractions:CFS – DSFS - DSDB
HTTP, FTP, RFIO,NeST, SRB, gLite
???
file transfer
filesystem
filesystem
filesystem
filesystem
filesystem
filesystem
filesystem
CentralFilesystem
App
Distributed Database Abstraction
Adapter
App
Distributed Filesystem Abstraction
Adapter
App
Cluster administrator controlspolicy on all storage in cluster
UNIX UNIX UNIX UNIX UNIX UNIX UNIX
Workstations owners controlpolicy on each machine.
fileserver
fileserver
fileserver
fileserver
fileserver
fileserver
fileserver
UNIX UNIX UNIX UNIX UNIX UNIX UNIX
???Adapter
Performance SummaryPerformance Summary
Nothing comes for free!Nothing comes for free!– System calls: order of magnitude slower.System calls: order of magnitude slower.– Memory bandwidth overhead: extra copies.Memory bandwidth overhead: extra copies.
However:However:– TSS can take full advantage of bandwidth (!NFS)TSS can take full advantage of bandwidth (!NFS)– TSS can drive network/switch to limits.TSS can drive network/switch to limits.– Typical slowdown on real apps: 5-10 percent.Typical slowdown on real apps: 5-10 percent.– Allows one to harness resources that would go unused.Allows one to harness resources that would go unused.– Observation: Most users constrained by Observation: Most users constrained by functionalityfunctionality..
OutlineOutline
Problems with the Standard ModelProblems with the Standard ModelTactical Storage SystemsTactical Storage Systems– File Servers, Catalogs, Abstractions, AdaptersFile Servers, Catalogs, Abstractions, Adapters
Applications:Applications:– Remote Database Access for BaBar CodeRemote Database Access for BaBar Code– Remote Dynamic Linking for CDF CodeRemote Dynamic Linking for CDF Code– Logical Data Access for Bioinformatics CodeLogical Data Access for Bioinformatics Code– Expandable Database for MD SimulationExpandable Database for MD Simulation
Improving the OS for Grid ComputingImproving the OS for Grid Computing
Remote Database AccessRemote Database Access
script
ParrotTSSfile
server
filesystem
DB data
libdb.so
sim.exe
WANCFS
HEP Simulation Needs Direct DB AccessHEP Simulation Needs Direct DB Access– App linked against Objectivity DB.App linked against Objectivity DB.– Objectivity accesses filesystem directly.Objectivity accesses filesystem directly.– How to distribute application How to distribute application securelysecurely??
Solution: Remote Root Mount via TSS:Solution: Remote Root Mount via TSS: parrot –M /=/chirp/fileserver/rootdirparrot –M /=/chirp/fileserver/rootdir
DB code can read/write/lock files directly.DB code can read/write/lock files directly.
GSI Auth
GSI
Credit: Sander Klous @ NIKHEF
Remote Application LoadingRemote Application Loading
appl
Parrot
ld.so HTTPserver
filesystem
liba.so
libb.so
libc.soWAN
Credit: Igor Sfiligoi @ Fermi National Lab
HTTP
Modular Simulation Needs Many LibrariesModular Simulation Needs Many Libraries– Devel. on workstations, then ported to grid.Devel. on workstations, then ported to grid.– Selection of library depends on analysis tech.Selection of library depends on analysis tech.– Constraint: Must use HTTP for file access.Constraint: Must use HTTP for file access.
Solution: Dynamic Link with TSS+HTTP:Solution: Dynamic Link with TSS+HTTP:– /home/cdfsoft -> /http/dcaf.fnal.gov/cdfsoft/home/cdfsoft -> /http/dcaf.fnal.gov/cdfsoft
select several MB from 60 GB of libraries
Technical ProblemTechnical Problem
HTTP is not a filesystem! (No directories)HTTP is not a filesystem! (No directories)– Advantages: Firewalls, caches, admins.Advantages: Firewalls, caches, admins.
Appl
Parrot
HTTP Module
HTTPServer
root
etchome bin
alice cmsbabar
opendir(/home)
opendir(/home)
GET /home HTTP/1.0
<HTML><HEAD>
<H1>
Technical ProblemTechnical Problem
Solution: Turn the directories into files.Solution: Turn the directories into files.– Can be cached in ordinary proxies!Can be cached in ordinary proxies!
Appl
Parrot
HTTP Module
HTTPServer
root
etchome bin
alice cmsbabar
opendir(/home)
opendir(/home)
GET /home/.dir HTTP/1.0
.dir
.dir
makehttpfs
alicebabarcms
Logical Access to Bio DataLogical Access to Bio Data
Many databases of biological data in different Many databases of biological data in different formats around the world:formats around the world:– Archives: Swiss-Prot, TreMBL, NCBI, etc...Archives: Swiss-Prot, TreMBL, NCBI, etc...– Replicas: Public, Shared, Private, ???Replicas: Public, Shared, Private, ???
Users and applications want to refer to data Users and applications want to refer to data objects by logical name, not location!objects by logical name, not location!– Access the nearest copy of the non-redundant protein Access the nearest copy of the non-redundant protein
database, don’t care where it is.database, don’t care where it is.
Solution: EGEE data management system maps Solution: EGEE data management system maps logical names (LFNs) to physical names (SFNs).logical names (LFNs) to physical names (SFNs).
Credit: Christophe Blanchet, Bioinformatics Center of Lyon, CNRS IBCP, Francehttp://gbio.ibcp.fr/cblanchet, Christophe.Blanchet@ibcp.fr
Logical Access to Bio DataLogical Access to Bio Data
BLAST
Parrot
RFIO gLite HTTP FTP
ChirpServer
FTPServer
gLiteServer
EGEE FileLocation Service
Run BLAST onLFN://ncbi.gov/nr.data
open(LFN://ncbi.gov/nr.data)
Where isLFN://ncbi.gov/nr.data?
Find it at:SFN://ibcp.fr/nr.data
nr.data
nr.data
nr.dataRETR nr.data
open(SFN://ibcp.fr/nr.data)
Appl: Distributed MD DatabaseAppl: Distributed MD DatabaseState of Molecular Dynamics Research:State of Molecular Dynamics Research:– Easy to run lots of simulations!Easy to run lots of simulations!– Difficult to understand the “big picture”Difficult to understand the “big picture”– Hard to systematically share results and ask questions.Hard to systematically share results and ask questions.
Desired Questions and Activities:Desired Questions and Activities:– ““What parameters have I explored?”What parameters have I explored?”– ““How can I share results with friends?”How can I share results with friends?”– ““Replicate these items five times for safety.”Replicate these items five times for safety.”– ““Recompute everything that relied on this machine.”Recompute everything that relied on this machine.”
GEMS: Grid Enabled Molecular SimsGEMS: Grid Enabled Molecular Sims– Distributed database for MD siml at Notre Dame.Distributed database for MD siml at Notre Dame.– XML database for indexing, TSS for storage/policy.XML database for indexing, TSS for storage/policy.
GEMS Distributed DatabaseGEMS Distributed Databasedatabase
server
catalogserver catalog
serverXML -> host1:fileAhost7:fileBhost3:fileC
A C BY Z X
XML -> host6:fileXhost2:fileYhost5:fileZ
data
XML+ Temp>300KMol==CH4
Credit: Jesus Izaguirre and Aaron Striegel, Notre Dame CSE Dept.
host5:fileZhost6:fileXDSFS
Adapter
Active Recovery in GEMSActive Recovery in GEMS
GEMS and Tactical StorageGEMS and Tactical Storage
Dynamic System ConfigurationDynamic System Configuration– Add/remove servers, discovered via catalogAdd/remove servers, discovered via catalog
Policy Control in File ServersPolicy Control in File Servers– Groups can Collaborate within ConstraintsGroups can Collaborate within Constraints– Security Implemented within File ServersSecurity Implemented within File Servers
Direct Access via AdaptersDirect Access via Adapters– Unmodified Simulations can use DatabaseUnmodified Simulations can use Database– Alternate Web/Viz Interfaces for Users.Alternate Web/Viz Interfaces for Users.
OutlineOutline
Problems with the Standard ModelProblems with the Standard ModelTactical Storage SystemsTactical Storage Systems– File Servers, Catalogs, Abstractions, AdaptersFile Servers, Catalogs, Abstractions, Adapters
Applications:Applications:– Remote Database Access for BaBar CodeRemote Database Access for BaBar Code– Remote Dynamic Linking for CDF CodeRemote Dynamic Linking for CDF Code– Logical Data Access for Bioinformatics CodeLogical Data Access for Bioinformatics Code– Expandable Database for MD SimulationExpandable Database for MD Simulation
Improving the OS for Grid ComputingImproving the OS for Grid Computing
OS Support for Grid ComputingOS Support for Grid Computing
Grid computing in general suffers because of Grid computing in general suffers because of limitations in the operating system.limitations in the operating system.
Security and permissions:Security and permissions:– No ACLs -> hard to share dataNo ACLs -> hard to share data– Only root can setuid -> hard to secure services.Only root can setuid -> hard to secure services.
Resource allocation:Resource allocation:– Cannot reserve space -> jobs crashCannot reserve space -> jobs crash– Hard to clean up procs -> unreliable systemsHard to clean up procs -> unreliable systems
student
root
alice
httpd
visitor
kerberos
bob
visitor
anon1 anon2
These two usersare completely different:
root:kerberos:alice:visitorroot:kerberos:bob:visitor
The web server can createdistinct anonymous accounts.
No need for global nobody.
kerberos given tothe login server.
alice createdby krb5 login.
student createdat run-time.
Tactical Storage SystemsTactical Storage Systems
Separate Separate AbstractionsAbstractions from from ResourcesResourcesComponents:Components:– Servers, catalogs, abstractions, adapters.Servers, catalogs, abstractions, adapters.– Completely user level.Completely user level.– Performance acceptable for real applications.Performance acceptable for real applications.
Independent but Cooperating ComponentsIndependent but Cooperating Components– Owners of file servers set policy.Owners of file servers set policy.– Users must work within policies.Users must work within policies.– Within policies, users are free to build.Within policies, users are free to build.
Parting ThoughtParting Thought
Many users of the grid are constrained by Many users of the grid are constrained by functionalityfunctionality, not performance., not performance.
TSS allows end users to build the TSS allows end users to build the structures that they need for the moment structures that they need for the moment without involving an admin.without involving an admin.
Analogy: building blocksAnalogy: building blocks
for distributed storage.for distributed storage.
AcknowledgmentsAcknowledgments
Science Collaborators:Science Collaborators:– Christophe BlanchetChristophe Blanchet– Sander Klous Sander Klous – Peter KunzstPeter Kunzst– Erwin LaureErwin Laure– John PoirerJohn Poirer– Igor SfiligoiIgor Sfiligoi
CS Collaborators:CS Collaborators:– Jesus IzaguirreJesus Izaguirre– Aaron StriegelAaron Striegel
CS Students:CS Students:– Paul BrennerPaul Brenner– James FitzgeraldJames Fitzgerald– Jeff HemmesJeff Hemmes– Paul MadridPaul Madrid– Chris MorettiChris Moretti– Phil SnowbergerPhil Snowberger– Justin WozniakJustin Wozniak
For more information...For more information...
Cooperative Computing LabCooperative Computing Lab
http://www.cse.nd.edu/~cclhttp://www.cse.nd.edu/~ccl
Cooperative Computing ToolsCooperative Computing Tools
http://http://www.cctools.orgwww.cctools.org
Douglas ThainDouglas Thain– dthain@cse.nd.edudthain@cse.nd.edu– http://http://www.cse.nd.edu/~dthainwww.cse.nd.edu/~dthain
Performance – System CallsPerformance – System Calls
Performance - ApplicationsPerformance - Applications
parrot only
Performance – I/O CallsPerformance – I/O Calls
Performance – BandwidthPerformance – Bandwidth
Performance – DSFSPerformance – DSFS
SP5 Performance on EDG TestbedSP5 Performance on EDG Testbed
SetupSetup Time to InitTime to Init Time/EventTime/Event
UnixUnix 446 +/- 46446 +/- 46 64s64s
LAN/NFSLAN/NFS 4464 +/- 1724464 +/- 172 113s113s
LAN/TSSLAN/TSS 4505 +/- 1554505 +/- 155 113s113s
WAN/TSSWAN/TSS 6275 +/- 3306275 +/- 330 88s88s
Recommended