Upload
yori
View
34
Download
1
Embed Size (px)
DESCRIPTION
GridFTP Roadmap. Bill Allcock (on behalf of the GridFTP team) Argonne National Laboratory. Usability & Performance Packaging GridFTP as RPM GWFTP GridFTP GUI Automatic Firewall Traversal Sync feature for globus-url-copy. Packaging GridFTP as RPM. - PowerPoint PPT Presentation
Citation preview
ALCF Argonne Leadership Computing Facility
GridFTP Roadmap
Bill Allcock (on behalf of the GridFTP team)Argonne National Laboratory
Argonne National Laboratory
– Usability & PerformancePackaging GridFTP as RPMGWFTPGridFTP GUIAutomatic Firewall TraversalSync feature for globus-url-copy
Argonne National Laboratory
Packaging GridFTP as RPM
Modify packaging of GridFTP and its dependencies
Make it suitable for packaging as an RPMMake it compatible with major Linux
distribution standardsEventually some distribution might pick it upGridFTP available as part of standard Linux
distribution– Attract a whole new set of users– Put it in par with scp, standard ftp in terms of
availability
Argonne National Laboratory
GridFTP Where there’s FTP (GWFTP)
GridFTP has been in existence for some time and has proven to be quite robust and useful
Only few GridFTP clients availableFTP has innumerable clientsGWFTP - created to leverage the FTP clientsA proxy between FTP clients and GridFTP
servers
Argonne National Laboratory
GWFTP
FTP Client
GWFTP(GSI
Credential)
wiggum.mcs.anl.govGridFTP Server
(2811)
USER <GWFTP username> ::gsiftp://wiggum.mcs.anl.gov:2811/
PASS GSI Authentication
Get requestGet request
DataData
Argonne National Laboratory
GUI Client
08/14/2008Computation Institute
Argonne National Laboratory
GridFTP GUI
A Java Web Start Application– Updates automatically– Users always use the latest release
Transfer files and directoriesThird-party transferMultiple concurrent transfersSupport authentication through MyProxyManage local and remote files and directories
– Browse– Create and delete
Argonne National Laboratory
Automatic Firewall Traversal
• Control channel port is statically assigned
• Data channel ports are dynamically assignedGridFTP Protocol Changes
• New commands to communicate the 4 tuple (src ip, src port, dst ip, dst port) to both ends of transfer • Use simultaneous Open/TCP splicing or Use a
broker to open ports temporarily
• Hooks in GridFTP to contact a broker at the right time
Argonne National Laboratory
Firewall
GridFTPSourceServer
GridFTPDest
Server
Client
TCP 2811TCP 2811
DATA
Argonne National Laboratory
Automatic traversal using a connection Broker
GridFTPSourceServer
GridFTPDest
Server
Client
TCP 2811TCP 2811
CB CB
DATA
IP 4 tuple IP 4 tupleTemporary hole Temporary hole
Argonne National Laboratory
Sync feature for globus-url-copyCheck for the existence of a file at the
destination before transferring If exists, determine whether the source
version is different from that of the destination
Based on how much the source has changed, optimize the transfer
Research into developing a logic that does not involve any changes to the GridFTP protocol
Argonne National Laboratory
– Reliability & SecurityImproved restart mechanismImproved memory management algorithm
Load balancingData channel security for SSH based GridFTP
GUMS authorization callout
Argonne National Laboratory
Improved Restart Mechanism
• globus-url-copy can recover from server and network failures
• Can not recover from its own failure
• Number of users including ESG, APS and SNS use this client to transfer large data sets with complex directory structures
• Develop methods to enable globus-url-copy to recover from its failure
Argonne National Laboratory
Gfork architecture
Server Host
GFork
Server
GridFTP
Plugin
GridFTP Server
Instance
Fork
GridFTP Server
InstanceGridFTP Server
Instance
State Sharing Link
Client
Inherited
Links
Control Channel Connections
Client
Client
Argonne National Laboratory
Memory Management
Optimistic memory provisioning by operating system– possible that under heavy loads GridFTP server can
consume all of systems memory resources.
Gfork – xinted like super server daemon– Allows state to be maintained across connections
GridFTP plugin for Gfork has a simple memory limiting option– 90% of the memory to the first 10% of the allowed
connections– Remaining connections receive half of what is
available
Develop an improved memory management algorithm
Argonne National Laboratory
Load balancing capabilities
The separation of processes buys the ability to proxy
– Allows for load balancing
– Frontend can choose from a pool of DPIs to service a client request
Client DPI
IPC
DPI
Frontend DPI
DPI
Argonne National Laboratory
sshd
SSH based GridFTP (GridFTP-
Lite)
Client
GridFTP Server
2811
Port 22
ROOT
USER
ssh Stdin/out
(control channel)
Argonne National Laboratory
Data Channel Security for SSH based GridFTPSSH based GridFTP does not have data
channel security Investigate and prototype a way to let a client
send a shared secret to both source and destination GridFTP servers • Used to secure the data channel(s) between the
two servers • Shared secret can be used to authenticate,
integrity-protect and encrypt the data channel• This feature will increase the adoption of SSH based
GridFTP
Argonne National Laboratory
GUMS Authorization Callout
• GUMS – Grid User Management System– Grid identity mapping service– Maps grid identity to local site identity– Used in OSG
GUMSserver
3. Obtain local identity from GUMS server/DC=org/DC=doegrids/OU=People/CN=John Bresnahanz
bresnaha
GridFTPClient
GUMS callout
1. Authentication
2. Data transfer operations
Disk
4. Access data as local identity
Argonne National Laboratory
GUMS Authorization Callout
• Role based authorization using voms extended proxy
GUMSserver
3. Obtain local identity from GUMS server/DC=org/DC=doegrids/OU=People/CN=John Bresnahanz
usatlasdev
GridFTPClient
GUMS callout
1. Authentication
2. Data transfer operations
Disk
4. Access data as local identity
/VO=ATLAS/Group=USATLAS/Role=developer
Argonne National Laboratory
– Quality of ServiceInformation providerProvision end-point GridFTP resources
Integrate network provisioningIntegrate storage provisioningCo-schedule data transfer resources
Argonne National Laboratory
GridFTP information provider service– Max connections– Open connections – Load
Higher level services can utilize this information for scheduling data transfers– Help with selecting the appropriate
replica of data
Information Provider
Argonne National Laboratory
Provision end-point resources
GridFTPServer
GridFTPInfo
Provider
CPU Memory BW
ResourceLimiter
Ad
ControlChannel
Data Movement Service (RFT replacement)
Data Point
GFTPResource
BrokerProvisionG
ridFTP
Argonne National Laboratory
Integrate Network Provisioning
GridFTPServer
GridFTPInfo
Provider
CPU Memory BW
ResourceLimiter
Ad
ControlChannel
Data Movement Service
Data Point
GFTPResource
BrokerProvisionG
ridFTP
Network Reservation
Service
ReserveBandwidth
Bandwidth
Token
Argonne National Laboratory
Integrate Storage Provisioning
GridFTPServer
GridFTPInfo
Provider
CPU Memory BW
ResourceLimiter
Ad
ControlChannel
Data Movement Service
Data Point
GFTPResource
BrokerProvisionG
ridFTP
Network Reservation
Service
ProvisionBandwidth
Bandwidth
Token
FileSystem
Lotman
Provision
Storage
Argonne National Laboratory
Co-schedule Data Transfer Resources
Data Movement Service
Network Reservation
ServiceProvision
Bandwidth
Source Data Point
Destination Data Point
Prov
ision
Grid
FTP
and
Stor
age
reso
urce
sProvision GridFTP and
Storage resources