Upload
chase-lyons
View
220
Download
0
Tags:
Embed Size (px)
Citation preview
Globus DataGrid Overview
Bill Allcock, ANL
GridPP Meeting
30 June 2003
Sources of Information / Support Me
– definitive source of information about GridFTP
– Responsible for requirements gathering, feature prioritization, getting developer resources, directing the development work, etc..
[email protected]– Extensive archive that is worth searching
– GridFTP developers monitor it and are good about answering, but not required.
Bugzilla http://bugzilla.globus.org/bugzilla– Used for submitting bugs
GridFTP Feature Set
– GSI, Kerberos security
– Third-party transfers
– Parameter set/negotiate
– Partial file access
– Reliability/restart
– Large file support
– Data channel reuse
– Defacto Standard on the Grid
– Integrated instrumentation
– Logging/audit trail
– Parallel transfers
– Striping
– TCP Buffer size control
– Policy-based access control
– Server-side computation
– Based on Standards
GridFTP at SC’2000: Long-Running Dallas-Chicago Transfer
SciNet Power Failure Other demos starting up
(Congestion)
Parallelism Increases (Demos)
Backbone problems on the SC Floor
DNS Problems
Transition between files (not zero due to averaging)
Reliable File Transfer Note that I said any *REMOTE* resource can fail Local failure would mean loss of state since it is held in
the clients memory. Could modify the restart plug-in to write state to disk. We opted for a service that accepts data transfer “jobs”
and uses a database. This provides increased robustness AND allows a client to
initiate a long running job and not have to tie up the local computer to keep it running.
We call this server the Reliable File Transfer (RFT) service One test ran 54 hours, moved 0.3 TB, and survived
muliple failures both natural and intentional
GridFTP: Standards Based
Existing standards– RFC 949: File Transfer Protocol
– RFC 2228: FTP Security Extensions
– RFC 2389: Feature Negotiation for the File Transfer Protocol
– Draft: FTP Extensions New drafts
– GridFTP: Protocol Extensions to FTP for the Grid> Grid Forum GridFTP Working Group
> Submitted for public comment
GridFTP: Future Work New Server Beta in August (wuftp replacement) w/
transport and security Striping functionality and HPSS released in Q1/Q2 2004
with HPSS 5.2b and logging. Other features based on demand. Improved testing and documentation Inclusion of Protocol extensions from GGF Interface in server for policy “engine”. I.e., “allocate one
stripe per 100MB of file size” New web services control channel protocol Utilization of Non-TCP network protocols Bandwidth Limiting
Basic Layout of GridFTP for HPSS
eXtensible IO Library (xio)
Abstract away the transport layer Define standard function signatures for
Read/Write/Open/Close Two types of drivers: transport and
transform Transport has to be the first pushed on the
stack Can have an arbitrary number of transform
drivers
Transform Driver Example (gsi)
Open does the authentication and if specified via an attribute, delegation.
Read/Write could be a simple pass through or if requested might do encryption or integrity.
Close in this case is a no-op. Kerberos *should* be easier. Simply pop
gsi and push kerberos.
Planned xio drivers Basics: TCP, UDP, file, gsi GridFTP: Make it simple for an application
to access files under the control of a GridFTP server.– Note that xio drivers can call xio drivers:
The GridFTP driver will call sockets which will call TCP
MultiStream Data Channel Protocol HTTP SABUL Rate Limiting
Transport Stack in Globus
Reliable File Transfer Service
New GridFTP Server
Extensible IO System (under all of Globus)
Client / User App can poke down the stack as necessary
Replica Management
Replica Catalog Structure: A Climate Modeling Example
Logical File Parent
Logical File Jan 1998
Logical CollectionC02 measurements 1998
Replica Catalog
Locationjupiter.isi.edu
Locationsprite.llnl.gov
Logical File Feb 1998
Size: 1468762
Filename: Jan 1998Filename: Feb 1998…
Filename: Mar 1998Filename: Jun 1998Filename: Oct 1998Protocol: gsiftpUrlConstructor: gsiftp://jupiter.isi.edu/ nfs/v6/climate
Filename: Jan 1998…Filename: Dec 1998Protocol: ftpUrlConstructor: ftp://sprite.llnl.gov/ pub/pcmdi
Logical CollectionC02 measurements 1999
A Replica Location Service A Replica Location Service (RLS) is a distributed
registry service that records the locations of data copies and allows discovery of replicas
Maintains mappings between logical identifiers and target names– Physical targets: Map to exact locations of replicated data
– Logical targets: Map to another layer of logical names, allowing storage systems to move data without informing the RLS
RLS was designed and implemented in a collaboration between the Globus project and the DataGrid project
LRC LRC LRC
RLIRLI
LRCLRC
Replica Location Indexes
Local Replica Catalogs
• LRCs contain consistent information about logical-to-target mappings on a site
• RLIs nodes aggregate information about LRCs
• Soft state updates from LRCs to RLIs: relaxed consistency of index information, used to rebuild index after failures
• Arbitrary levels of RLI hierarchy
A Flexible RLS Framework
Five elements:1. Consistent Local State: Records mappings between
logical names and target names and answers queries
2. Global State with relaxed consistency: Global index supports discovery of replicas at multiple sites; relaxed consistency
3. Soft state mechanisms for maintaining global state: LRCs send information about their mappings (state) to RLIs using soft state protocols
4. Compression of state updates (optional): reduce communication and storage overheads
5. Membership service: for location of participating LRCs and RLIs and dealing with changes in membership
LRC LRC LRC
RLIRLI
LRCLRC
Replica Location Indexes
Local Replica Catalogs
An RLS with No Redundancy, Partitioning of Index by Storage Sites
An RLS with Redundancy
Replica Location Service In Context
Replica Location ServiceReliable Data
Transfer Service
GridFTP
Reliable Replication Service
Replica Consistency Management Services
MetadataService
The Replica Location Service is one component in a layered data management architecture
Provides a simple, distributed registry of mappings Consistency management provided by higher-level services
Components of RLS Implementation
Front-End Server – Multi-threaded
– Supports GSI Authentication
– Common implementation for LRC and RLI
Back-end Server– mySQL Relational Database
– Holds logical name to target name mappings
Client APIs: C and Java DB
LRC/RLI Server
ODBC (libiodbc)
myodbc
mySQL Server
clientclient
Implementation Features Two types of soft state updates from LRCs to RLIs
– Complete list of logical names registered in LRC– Bloom filter summaries of LRC
User-defined attributes – May be associated with logical or target names
Partitioning– Divide LRC soft state updates among RLI index nodes
using pattern matching of logical names
Membership service– Static configuration only– Eventually use OGSA registration techniques