35
Globus Striped GridFTP Framework and Server Raj Kettimuthu, ANL and U. Chicago

Globus Striped GridFTP Framework and Server Raj Kettimuthu, ANL and U. Chicago

Embed Size (px)

Citation preview

Globus Striped GridFTP Framework and Server

Raj Kettimuthu,

ANL and U. Chicago

3 August 2005 The Ohio State University 2

Outline

Introduction Features Motivation Architecture Globus XIO Experimental Results

3 August 2005 The Ohio State University 3

What is GridFTP?

In Grid environments, access to distributed data is very important

Distributed scientific and engineering applications require: Transfers of large amounts of data between storage

systems, and Access to large amounts of data by many geographically

distributed applications and users for analysis, visualization etc

GridFTP - a general-purpose mechanism for secure, reliable and high performance data movement.

3 August 2005 The Ohio State University 4

Features Standard FTP features get/put etc, third

party control of data transfer User at one site to initiate, monitor and

control data transfer operation between two other sites

Authentication, data integrity, data confidentiality Support Generic Security Services (GSS)-API

authentication of the connections Support user-controlled levels of data

integrity and/or confidentiality

3 August 2005 The Ohio State University 5

Features

Parallel data transfer Multiple transport streams between a single

source and destination Striped data transfer

1 or more transport streams between m network endpoints on the sending side and n network endpoints on the receiving side

3 August 2005 The Ohio State University 6

Features

Partial file transfer Some applications can benefit from

transferring portions of file For example, analyses that require access to

subsets of massive object-oriented database files

Manual/Automatic control of TCP buffer sizes

Support for reliable and restartable data transfer

3 August 2005 The Ohio State University 7

Motivation

Continued commoditization of end system devices means the data sources and sinks are often clusters A common configuration might have

individual nodes connected by 1 Gbit/s Ethernet connection to a switch connected to external network at 10 Gbit/s or faster

Striped data movement - data distributed across a set of computers at one end is transferred to remote set of computers

3 August 2005 The Ohio State University 8

Motivation

Data sources and sinks come in many shapes and sizes Clusters with local disks, clusters with

parallel file system, geographically distributed data sources, archival storage systems

Enable clients to access such sources and sinks via a uniform interface

Make it easy to adapt our system to support different sources and sinks

3 August 2005 The Ohio State University 9

Motivation Standard protocol for network data transfer

remains TCP. TCP’s congestion control algorithm can lead to poor

performance Particularly in default configurations and on paths

with high bandwidth and high round trip time Solutions include:

Careful tuning of TCP parameters, multiple TCP connections and substitution of alternate UDP based reliable protocols

We want to support such alternatives

3 August 2005 The Ohio State University 10

Architecture GridFTP (and normal FTP) use (at least) two separate

socket connections: A control channel for carrying the commands and

responses A Data Channel for actually moving the data

GridFTP (and normal FTP) has 3 distinct components: Client and server protocol interpreters which handle

control channel protocol Data Transfer Process which handles the accessing of

actual data and its movement via the data channel

3 August 2005 The Ohio State University 11

Architecture

These components can be combined in various ways to create servers with different capabilities. For example, combining the server PI and DTP

components in one process creates a conventional FTP server

A striped server might use one server PI on the head node of a cluster and a DTP on all other nodes.

3 August 2005 The Ohio State University 12

Globus GridFTP Architecture

D a t a C h a n n e l

S e r v e r P I

D T P

D e s c r i p t i o n o f t r a n s f e r : c o m p l e t e l y

s e r v e r - i n t e r n a l c o m m u n i c a t i o n .

P r o t o c o l i s u n s p e c i f i e d a n d l e f t u p

t o t h e i m p l e m e n t a t i o n .

S e r v e r P I

D T P

I n t e r n a l I P C A P I I n t e r n a l I P C A P I

C l i e n t P II n f o o n t r a n s f e r : r e s t a r t

m a r k e r s , p e r f o r m a n c e

m a r k e r s , e t c . S e r v e r P I

o p t i o n a l l y p r o c e s s e s

t h e s e , t h e n s e n d s

t h e m t o t h e c l i e n t P I

C o n t r o l

C h a n n e l s

3 August 2005 The Ohio State University 13

Architecture

The server PI handles the control channel exchange.

In order for a client to contact a GridFTP server Either the server PI must be running as a

daemon and listening on a well known port (2811 for GridFTP), or

Some other service (such as inetd) must be listening on the port and be configured to invoke the server PI.

The client PI then carries out its protocol exchange with the server PI.

3 August 2005 The Ohio State University 14

Architecture

When the server PI receives a command that requires DTP activity, the server PI passes the description of the transfer to DTP

After that, DTP can carry out the transfer on its own.

The server PI then simply acts as a relay for transfer status information. Performance markers, restart markers, etc.

3 August 2005 The Ohio State University 15

Architecture

PI-to-DTP communications are internal to the server Protocol used can evolve with no impact on the

client. The data channel communication structure is

governed by data layout. If the number of nodes at both ends is equal, each

node communicates with just one other node. Otherwise, each sender makes a connection to each

receiver, and sends data based on data offsets.

3 August 2005 The Ohio State University 16

MODE ESPAS (Listen) - returns list of host:port pairsSTOR <FileName>

MODE ESPOR (Connect) - connect to the host-port pairsRETR <FileName>

18-Nov-03

GridFTP Striped Transfer

Host Z

Host Y

Host A

Block 1

Block 5

Block 13

Block 9

Host B

Block 2

Block 6

Block 14

Block 10

Host C

Block 3

Block 7

Block 15

Block 11

Host D

Block 4

Block 8 - > Host D

Block 16

Block 12 -> Host D

Host X

Block1 -> Host A

Block 13 -> Host A

Block 9 -> Host A

Block 2 -> Host B

Block 14 -> Host B

Block 10 -> Host B

Block 3 -> Host C

Block 7 -> Host C

Block 15 -> Host C

Block 11 -> Host C

Block 16 -> Host D

Block 4 -> Host D

Block 5 -> Host A

Block 6 -> Host B

Block 8

Block 12

3 August 2005 The Ohio State University 17

Data transfer pipeline

D a t a

A c c e s s

M o d u l e

D a t a

P r o c e s s i n g

M o d u l e

D a t a

C h a n n e l

P r o t o c o l

M o d u l e

D a t a

s o u r c e

o r s i n k

D a t a

c h a n n e l

Data Transfer Process is architecturally, 3

distinct pieces: Data Access Module, Data processing module

and Data Channel Protocol Module

3 August 2005 The Ohio State University 18

Data access module

Number of storage systems in use by the scientific and engineering community Distributed Parallel Storage System (DPSS) High Performance Storage System (HPSS) Distributed File System (DFS) Storage Resource Broker (SRB) HDF5

Use incompatible protocols for accessing data and require the use of their own clients

3 August 2005 The Ohio State University 19

Data access module

It provides a modular pluggable interface to data storage systems.

Conceptually, the data access module is very simple.

It consist of several function signatures and a set of semantics.

When a new module is created, programmer implements the functions to provide the semantics associated with them.

3 August 2005 The Ohio State University 20

Data processing module

The data processing module - provides ability to manipulate the data prior to transmission. Compression, on-the-fly concatenation of

multiple files etc Currently handled via the data access

module In future we plan to make this a separate

module

3 August 2005 The Ohio State University 21

Data channel protocol module

This module handles the operations required to fetch data from, or send data to, the data channel. A single server may support multiple data channel

protocols the MODE command is used to select the protocol to

be used for a particular transfer. We use Globus eXtensible Input/Output (XIO)

system as the data channel protocol module interface Currently support two bindings: Stream-mode TCP

and Extended Block Mode TCP

3 August 2005 The Ohio State University 22

Extended block mode

Striped and parallel data transfer require support for out-of-order data delivery

Extended block mode supports out-of-sequence data delivery

Extended block mode header

Descriptor is used to indicate if this block is a EOD marker, restart marker etc

Descriptor

8 bits

Byte Count

64 bits

Offset

64 bits

3 August 2005 The Ohio State University 23

Globus XIO

Simple Open/Close/Read/Write API Make single API do many types of IO Many Grid IO needs can be treated as a

stream of bytes Open/close/read/write functionality

satisfies most requirements Specific drivers for specific

protocols/devices

3 August 2005 The Ohio State University 24

Typical approach

3 August 2005 The Ohio State University 25

Globus XIO approach

3 August 2005 The Ohio State University 26

Experimental results

Three settings LAN - 0.2 msec RTT and 622 Mbit/s MAN - 2.2 msec RTT and 1 Gbit/s WAN - 60 msec RTT and 30 Gbit/s MAN - Distributed Optical Testbed in the

Chicago area WAN - TeraGrid link between NCSA in Illinois

and SDSC in California - each individual has a 1Gbit/s bottleneck link

3 August 2005 The Ohio State University 27

Experimental results - LAN

0

100

200

300

400

500

600

700

0 10 20 30 40

number of streams

bandwidth (Mbit/s)

iperf globus mem

globus disk bonnie

3 August 2005 The Ohio State University 28

Experimental results - MAN

0100200300400500600700800900

1000

0 10 20 30 40

number of streams

bandwidth (Mbit/s)

iperf globus mem

globus disk bonnie

3 August 2005 The Ohio State University 29

Experimental results - WAN

0

200

400

600

800

1000

1200

1400

0 10 20 30 40

number of streams

bandwidth (Mbit/s)

iperf globus mem

globus disk bonnie

3 August 2005 The Ohio State University 30

Memory to MemoryStriping Performance

0

5000

10000

15000

20000

25000

30000

0 10 20 30 40 50 60 70

Degree of Striping

Bandwidth (Mbit/s)

# Stream = 1 # Stream = 2 # Stream = 4 # Stream = 8 # Stream = 16 # Stream = 32

3 August 2005 The Ohio State University 31

Disk to Disk Striping Performance

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

20000

0 10 20 30 40 50 60 70

Degree of Striping

Bandwidth (Mbit/s)

# Stream = 1 # Stream = 2 # Stream = 4 # Stream = 8 # Stream = 16 # Stream = 32

3 August 2005 The Ohio State University 32

Scalability tests

Evaluate performance as a function of the number of clients

DiPerf test framework to deploy the clients Ran server on a 2-processor 1125 MHz x86

machine running Linux 2.6.8.1 1.5 GB memory and 2 GB swap space 1 Gbit/s Ethernet network connection and 1500 B

MTU Clients created on hosts distributed over PlanetLab

and at the University of Chicago (UofC)

3 August 2005 The Ohio State University 33

Scalability Results

0

2 0 0

4 0 0

6 0 0

8 0 0

1 0 0 0

1 2 0 0

1 4 0 0

1 6 0 0

1 8 0 0

2 0 0 0

0 5 0 0 1 0 0 0 1 5 0 0 2 0 0 0 2 5 0 0 3 0 0 0 3 5 0 0

T im e ( s e c )

0

1 0

2 0

3 0

4 0

5 0

6 0

7 0

8 0

9 0

1 0 0

T h r o u g h p u t ( M b y t e / s ) _

_ R e s p o n s e T im e ( s )

_ L o a d ( c o n c u r r e n t c l ie n t s )

C P U % _

_ M e m o r y ( M b y t e s )

Left axis - load, response time, memory allocated

Right axis - Throughput and CPU %

3 August 2005 The Ohio State University 34

Scalability results

1800 clients mapped in a round robin fashion on 100 PlanetLab hosts and 30 UofC hosts

A new client created once a second and ran for 2400 seconds During this time, repeatedly requests the

transfer of 10 Mbyte file from server’s disk to client’s /dev/null

Total of 150.7 Gbytes transferred in 15,428 transfers

3 August 2005 The Ohio State University 35

Scalability results

Server sustained 1800 concurrent requests with 70% CPU and 0.94 Mbyte memory per request

CPU usage, throughput, response time remain reasonable even when allocated memory exceeds physical memory Meaning paging is occuring