Upload
clifton-crawford
View
217
Download
3
Embed Size (px)
Citation preview
Peter F. Couvares(based on material from Tevfik Kosar, Nick LeRoy, and Jeff Weber)
Associate Researcher, Condor TeamComputer Sciences DepartmentUniversity of Wisconsin-Madison<[email protected]>
STORK & NeST: Making Data Placement a First Class Citizen in the Grid
Reliable and Efficient Grid Data Placement 3www.cs.wisc.edu/condor
While doing this..› Locate the data
› Access heterogeneous resources
› Recover form all kinds of failures
› Allocate and de-allocate storage
› Move the data
› Clean-up everything
All of these need to be done reliably and efficiently!
Reliable and Efficient Grid Data Placement 4www.cs.wisc.edu/condor
Stork
A scheduler for data placement activities in the GridWhat Condor is for computational jobs, Stork is for data placement Stork’s fundamental concept:“Make data placement a first class citizen in
the Grid.”
Reliable and Efficient Grid Data Placement 5www.cs.wisc.edu/condor
Outline
IntroductionThe ConceptStork FeaturesBig PictureConclusions
Reliable and Efficient Grid Data Placement 6www.cs.wisc.edu/condor
The Concept
• Stage-in
• Execute the Job
• Stage-out
Individual Jobs
Reliable and Efficient Grid Data Placement 7www.cs.wisc.edu/condor
The Concept
• Stage-in
• Execute the Job
• Stage-out
Stage-in
Execute the job
Stage-outRelease input space
Release output space
Allocate space for input & output data
Individual Jobs
Reliable and Efficient Grid Data Placement 8www.cs.wisc.edu/condor
The Concept
• Stage-in
• Execute the Job
• Stage-out
Stage-in
Execute the job
Stage-outRelease input space
Release output space
Allocate space for input & output data
Data Placement Jobs
Computational Jobs
Reliable and Efficient Grid Data Placement 9www.cs.wisc.edu/condor
DAGMan
The Concept
CondorJob
QueueData A A.submitData B B.submitJob C C.submit…..Parent A child BParent B child CParent C child D, E…..
C
StorkJob
Queue
E
DAG specification
A CBD
E
F
Reliable and Efficient Grid Data Placement 10www.cs.wisc.edu/condor
Why Stork?
Stork understands the characteristics and semantics of data placement jobs.Can make smart scheduling decisions, for reliable and efficient data placement.
Reliable and Efficient Grid Data Placement 11www.cs.wisc.edu/condor
Understanding Job Characteristics &
SemanticsJob_type = transfer, reserve, release?Source and destination hosts, files, protocols to use?• Determine concurrency level • Can select alternate protocols• Can select alternate routes• Can tune network parameters (tcp buffer size, I/O
block size, # of parallel streams)• …
Reliable and Efficient Grid Data Placement 12www.cs.wisc.edu/condor
Support for Heterogeneity
Protocol translation using Stork memory buffer.
Reliable and Efficient Grid Data Placement 13www.cs.wisc.edu/condor
Support for Heterogeneity
Protocol translation using Stork Disk Cache.
Reliable and Efficient Grid Data Placement 14www.cs.wisc.edu/condor
Flexible Job Representation
[ Type = “Transfer”; Src_Url = “srb://ghidorac.sdsc.edu/kosart.condor/x.dat”; Dest_Url = “nest://turkey.cs.wisc.edu/kosart/x.dat”;
…………
]
Reliable and Efficient Grid Data Placement 15www.cs.wisc.edu/condor
Failure Recovery and Efficient Resource
UtilizationFault tolerance• Just submit a bunch of data placement jobs, and then
go away..
Control number of concurrent transfers from/to any storage system• Prevents overloading
Space allocation and De-allocations• Make sure space is available
Reliable and Efficient Grid Data Placement 16www.cs.wisc.edu/condor
Outline
IntroductionThe ConceptStork FeaturesBig PictureConclusions
JOB DESCRIPTION
S
PLANNER
DATA PLACEMENT SCHEDULER
COMPUTATION SCHEDULER
USER
GRID STORAGE SYSTEMS
WORKFLOW MANAGER
Abstract DAG
Concrete DAG
RLS / MCAT
GRID COMPUTE NODES
JOB DESCRIPTION
S
PLANNER
DATA PLACEMENT SCHEDULER
COMPUTATION SCHEDULER
USER
GRID STORAGE SYSTEMS
WORKFLOW MANAGER
Abstract DAG
Concrete DAG
RLS / MCAT
GRID COMPUTE NODES
D. JOB LOG FILES
C. JOB LOG FILES
POLICY ENFORCER
JOB DESCRIPTION
S
PLANNER
DATA PLACEMENT SCHEDULER
COMPUTATION SCHEDULER
USER
GRID STORAGE SYSTEMS
WORKFLOW MANAGER
Abstract DAG
Concrete DAG
RLS / MCAT
GRID COMPUTE NODES
D. JOB LOG FILES
C. JOB LOG FILES
POLICY ENFORCER
DATA MINER
NETWORK MONITORING TOOLS
FEEDBACK MECHANISM
JOB DESCRIPTION
S
PLANNER
STORKCONDOR/ CONDOR-G
USER
GRID STORAGE SYSTEMS
DAGMAN
Abstract DAG
Concrete DAG
RLS / MCAT
GRID COMPUTE NODES
D. JOB LOG FILES
C. JOB LOG FILES
MATCHMAKER
DATA MINER
NETWORK MONITORING TOOLS
FEEDBACK MECHANISM
Reliable and Efficient Grid Data Placement 23www.cs.wisc.edu/condor
ConclusionsRegard data placement as individual jobs.Treat computational and data placement jobs differently.Introduce a specialized scheduler for data placement.Provide end-to-end automation, fault tolerance, run-time adaptation, multilevel policy support, reliable and efficient transfers.
Reliable and Efficient Grid Data Placement 24www.cs.wisc.edu/condor
Future work
Enhanced interaction between Stork and higher level planners• better coordination of CPU and I/O
Interaction between multiple Stork servers and job delegation from one to anotherEnhanced authentication mechanismsMore run-time adaptation
Reliable and Efficient Grid Data Placement 25www.cs.wisc.edu/condor
Related PublicationsTevfik Kosar and Miron Livny. “Stork: Making Data Placement a First Class Citizen in the Grid”. In Proceedings of 24th IEEE Int. Conference on Distributed Computing Systems (ICDCS 2004), Tokyo, Japan, March 2004.George Kola, Tevfik Kosar and Miron Livny. “A Fully Automated Fault-tolerant System for Distributed Video Processing and Off-site Replication. To appear in Proceedings of 14th ACM Int. Workshop on etwork and Operating Systems Support for Digital Audio and Video (Nossdav 2004), Kinsale, Ireland, June 2004.Tevfik Kosar, George Kola and Miron Livny. “A Framework for Self-optimizing, Fault-tolerant, High Performance Bulk Data Transfers in a Heterogeneous Grid Environment”. In Proceedings of 2nd Int. Symposium on Parallel and Distributed Computing (ISPDC 2003), Ljubljana, Slovenia, October 2003.George Kola, Tevfik Kosar and Miron Livny. “Run-time Adaptation of Grid Data Placement Jobs”. In Proceedings of Int. Workshop on Adaptive Grid Middleware (AGridM 2003), New Orleans, LA, September 2003.
Reliable and Efficient Grid Data Placement 26www.cs.wisc.edu/condor
You don’t have to FedEx your data
anymore.. Stork delivers it for
you!For more information:
• Email: [email protected]• http://www.cs.wisc.edu/condor/stork
Reliable and Efficient Grid Data Placement 27www.cs.wisc.edu/condor
NeST
NeST (Network Storage Technology)A lightweight, portable storage manager for data placement activities on the GridAllocation: NeST negotiates “mini storage contracts” between users and server.Multi-protocol: Supports Chirp, GridFTP, NFS, HTTP
Chirp is NeST’s internal protocol.
Secure: GSI authenticationLightweight: Configuration and installation can be performed in minutes, and does not require root.
www.cs.wisc.edu/condor
Why storage allocations ?
› Users need both temporary storage, and long-term guaranteed storage.
› Administrators need a storage solution with configurable limits and policy.
› Administrators will benefit from NeST’s automatic reclamations of expired storage allocations.
www.cs.wisc.edu/condor
Storage allocations in NeST
› Lot – abstraction for storage allocation with an associated handle• Handle is used for all subsequent operations
on this lot
› Client requests lot of a specified size and duration. Server accepts or rejects client request.
www.cs.wisc.edu/condor
Lot types› User / Group
• User: single user (user controls ACL)• Group: shared use (all members control ACL)
› Best effort / Guaranteed• Best effort: server may purge data if necessary. Good
fit for derived data.• Guaranteed: server honors request duration.
› Hierarchical: Lots with lots (“sublots”)
www.cs.wisc.edu/condor
Lot operations
› Create, Delete, Update› MoveFile
• Moves files between lots
› AddUser, RemoveUser• Lot level authorization• List of users allowed to request sub-lots
› Attach / Detach• Performs NeST lot to path binding
www.cs.wisc.edu/condor
Functionality: GT4 GridFTP
GridFTP Server
Disk Module
Disk
Storage
globus-url-copy
Sample Application
(GSI-FTP)
www.cs.wisc.edu/condor
NeST Server
Functionality: GridFTP +NeST
GridFTP Server
NeST Module
Disk
Storage
NeST Client
Chirp
Handler
globus-url-copy
(Lot operations, etc.)(File transfers)
(chirp)(GSI-FTP)
(File transfer)(chirp)
www.cs.wisc.edu/condor
NeST Server
NeST with Stork
GridFTP Server
NeST Module
Stork
(Lot operations, etc.)(chirp)
(File transfers)(GSI-FTP)
(File transfer)(chirp)
GT4 NeST
Disk
Storage
Chirp
Handler
www.cs.wisc.edu/condor
Stork
NeST Sample Work DAG
NeST
Allocate
Job
Job
Job
Xfer In Xfer Out Release
Stork
Condor-G
www.cs.wisc.edu/condor
Connection Manager
› Used to control connections to NeST
› Allows connection reservations:• Reserve # of simultaneous connections• Reserve total bytes of transfer• Reservations have durations & expire• Reservations are persistent
www.cs.wisc.edu/condor
Release Status
› http://www.cs.wisc.edu/condor/nest› v0.9.7 expected soon
• v0.9.7 Pre 2 released• Just bug fixes; features frozen
› v1.0 expected later this year› Currently supports Linux, will support other
O/S’s in the future.
www.cs.wisc.edu/condor
Roadmap
› Performance tests with Stork
› Continue hardening code base
› Expand supported platforms• Solaris & other UNIX-en
› Bundle with Condor
› Connection Manager