57
Ali Kaplan [email protected] Advisor: Prof. Geoffrey C. Fox 2/02/2009 1

Ali Kaplan [email protected] Advisor: Prof. Geoffrey C. Fox 2/02/20091

Embed Size (px)

Citation preview

Page 1: Ali Kaplan alikapla@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 2/02/20091

Ali [email protected]

Advisor: Prof. Geoffrey C. Fox

2/02/2009 1

Page 2: Ali Kaplan alikapla@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 2/02/20091

OutlineIntroductionBackgroundMotivation and Research IssuesGridTorrent Framework ArchitectureMeasurements and AnalysisContributions and Future Works

2/02/2009 2

Page 3: Ali Kaplan alikapla@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 2/02/20091

Data, Data, more Data• Computational science is changing to be data

intensive• Scientists are faced with mountains of data that

stem from three sources[1]:1. New scientific instruments data generation is

monotonic2. Simulations generates flood of data3. The Internet and computational Grid allow the

replication, creation, and recreation of more data[2]

2/02/2009 3

Page 4: Ali Kaplan alikapla@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 2/02/20091

Data, Data, more Data (cont.)

Scientific discovery increasingly driven by data collection[3] Computationally intensive analysesMassive data collectionsData distributed across networks of varying capabilityInternationally distributed collaborations

Data Intensive Science: 2000-2020 [4] Dominant factor: data growth (1 Petabyte = 1000 TB)

2000 ~0.5 Petabyte 2007 ~10 Petabytes 2013 ~100 Petabytes 2020 ~1000 Petabytes?

2/02/2009 4

Page 5: Ali Kaplan alikapla@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 2/02/20091

Scientific Application Examples

Scientific applications generates petabytes of data are very diverse.

– Fusion power– Climate modeling – Astronomy– High-energy physics – Bioinformatics– Earthquake engineering

2/02/2009 5

Page 6: Ali Kaplan alikapla@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 2/02/20091

Scientific Application Examples (cont.)

Some examples Climate modeling

Community Climate System Model and other simulation applications generates 1.5 petabytes/year

Bioinformatics The Pacific Northwest National Laboratory is building new Confocal

microscopes which will be generating 5 petabytes/year

High-energy physics The Large Hadron Collider (LHC) project at CERN will create 100

petabytes/year

2/02/2009 6

Page 7: Ali Kaplan alikapla@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 2/02/20091
Page 8: Ali Kaplan alikapla@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 2/02/20091

BackgroundSystems for transferring bulk

dataNetwork level solutionsSystem level solutionsApplication level solutions

2/02/2009 8

Page 9: Ali Kaplan alikapla@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 2/02/20091

Background (cont.)Cost

Prevalence2/02/2009 9

Page 10: Ali Kaplan alikapla@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 2/02/20091

Network Level SolutionsNetwork Attached Storage (NAS)

File-level storage system attached to traditional network

Use higher-level protocolsDoes not allow direct access to individual storageSimpler and more economical solution than SAN

Storage Area Network (SAN)Storage devices attached directly to LANUtilize low-level network protocols (Fiber Channels)Handle large data transfersProvide better performance

2/02/2009 10

Page 11: Ali Kaplan alikapla@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 2/02/20091

System Level Solutions-Require modifications to

the operating systems of the machineThe network apparatusOr both

+ Yield very good performance- Expensive solutions- Not applicable to every systemGroup Transport Protocol for Lambda-Grids (GTP)

2/02/2009 11

Page 12: Ali Kaplan alikapla@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 2/02/20091

2/02/200 12

Application Level Solutions+Use parallel streaming to improve performance+Tweak TCP buffer size to improve performance+Require no modifications to underlying systems+Inexpensive+Prevalent use+-May require auxiliary component for data

management-May not be as fast as Network/System level solutionsType of application solutions

TCP based solutionUDP based Solutions

Page 13: Ali Kaplan alikapla@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 2/02/20091

TCP-Based Solutions+Harness the good features of TCP

+Reliability+-Built-in congestion control mechanism (TCP

Window)+Require no changes on existing system+Easy to implement+Prevalent use-Not suitable for real-time applicationsGridFTP, GridHTTP, bbFTP and bbcp

Use mainly FTP or HTTP as base protocol

2/02/2009 13

Page 14: Ali Kaplan alikapla@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 2/02/20091

UDP-Based Solutions+Small segment head overhead (8 vs. 20 bytes)-Unreliable+-Require additional mechanism for reliability and

congestion control (at application level)+May overcome existing problems of TCP+May make UDP faster

-Integration with existing systems require some changes and efforts

SABUL, UDT, FOBS, RBUDP, Tsunami, and UFTPUtilized mainly rate-based control mechanism

2/02/2009 14

Page 15: Ali Kaplan alikapla@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 2/02/20091

Auxiliary ComponentsUsed for file indexing and discoveryGridFTP utilizes the Replica Location Service

(RLS)Local Replica Catalogs (LRCs) Replica Location Indices (RLIs) LRCs send information about their state to RLIs

using soft state protocols

2/02/2009 15

Page 16: Ali Kaplan alikapla@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 2/02/20091

Motivation and Research IssuesProblems of Existing SolutionsBuilt-on client/server model

Why not P2P?Utilize mainly FTP/HTTP type of protocols

Suffer from drawbacks of FTP/HTTPModification is very difficult

Require to build some vital services as separate modules

Use existing system resources inefficiently2/02/2009 16

Page 17: Ali Kaplan alikapla@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 2/02/20091
Page 18: Ali Kaplan alikapla@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 2/02/20091
Page 19: Ali Kaplan alikapla@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 2/02/20091

Comparison of BitTorrent and GridTorrent’s ArchitectureBitTorrent GridTorrent Reason

P2P data-sharing protocol

P2P data-sharing protocol

No change

Simple HTTP Client SOA-based Tracker Client

To enable advanced operations exchange with WS-Tracker Service

- Task Manager To enable execution of advanced operations in Client such as remote sharing and ACL

Web Server based Tracker

Advanced SOA-based Tracker

To allow the system to build and to handle complex actions required by scientific community

- Security Manager To provide authentication and authorization mechanism

- Collaboration and Content Manager

To empower users to control access rights to their content and to start remote sharing, downloading processes and permit interactions between them

- Supporting Multiple Streams

To improve further data transmission performance2/02/2009 19

Page 20: Ali Kaplan alikapla@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 2/02/20091

2/02/2009 20

Page 21: Ali Kaplan alikapla@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 2/02/20091
Page 22: Ali Kaplan alikapla@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 2/02/20091

Collaboration and Content ManagerAn Interface between users and the systemCapabilities:

Share contentBrowse contentDownload contentAdd/remove groupAdd/remove users for a particular content (Access

Right Controls)Add/remove users for a particular group (Access Right

Controls)Everything is metadata

2/02/2009 22

Page 23: Ali Kaplan alikapla@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 2/02/20091

WS-Tracker Service component of GridTorrent Framework Architecture

2/02/2009 23

Page 24: Ali Kaplan alikapla@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 2/02/20091

WS-Tracker ServiceThe communication hub

of the systemLoosely-coupled, flexible

and extensibleDeliver tasks to

GridTorrent clientsUpdate tasks status in

databaseStore and serve .torrent

files

2/02/2009 24

Database

WS-Tracker Service

GridTorrentClient

Get AvailableTasks

Ask for tasksDeliver

Task

Deliver .torrent file

Update Records

Page 25: Ali Kaplan alikapla@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 2/02/20091

TaskA task is simply metadata (wrapped actions)

RequestResponsePeriodicNon-periodic

Instructs a GridTorrent client what to do with whomCreated by usersExchanged between WS-Tracker service and GridTorrent

client

2/02/2009 25

Page 26: Ali Kaplan alikapla@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 2/02/20091

Task Format

2/02/2009 26

Page 27: Ali Kaplan alikapla@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 2/02/20091

Tasks overviewNo Task Name Creator Source Destination Category

1 Task List Request GTFC GTFC WS-Tracker request, periodic

2 Share Content Request

User WS-Tracker

GTFC request, nonperiodic

3 Share Content Response

GTFC GTFC WS-Tracker Response, nonperiodic

4 Download ContentRequest

User WS-Tracker

GTFC Request, nonperiodic

5 Download Content Response

GTFC GTFC WS-Tracker response, periodic

6 ACL Request GTFC GTFC WS-Tracker request, periodic

7 ACL Response User WS-Tracker

GTFC response

8 Update Status GTFC GTFC WS-Tracker periodic

Page 28: Ali Kaplan alikapla@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 2/02/20091

GridTorrent Client component of GridTorrent Framework Architecture

2/02/2009 28

Page 29: Ali Kaplan alikapla@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 2/02/20091

GridTorrent ClientModular architecture

Provides extensibility and flexibility

Built-on P2P file sharing protocolEnables to utilize idle resources

efficientlyProvides adequate security

AuthenticationAuthorization

2/02/2009 29

Data Transfer Modules Management Modules

J avaTCP

Socket...

Task Manager

WS-TrackerClient

J ava CoG Kit

Security Manager

Torrent Data Sharing Logic

J avaPTCP

Socket

J ava WS Security

Dat

a S

har

ing

Alg

orit

hm

Lay

er

Cor

eM

odu

les

Lay

erS

ecu

rity

Lay

erG

rid

Inn

terf

ace

Utilizes regular and parallel stream connection (other transferring mechanism could be used)

Page 30: Ali Kaplan alikapla@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 2/02/20091

PeerA’s Data Sharing Module

PeerB’s Security ModulePeerA’s Security Module

2/02/2009 30

PeerA starts authentication

process

PeerB handles PeerA’s request

Authorization successful?

Yes PeerA in

ACL?

PeerB gives PeerA data port number and passkey, also save passkey for

further use

Reject Connection

PeerA’s Data Sharing ModulePeerA connects received data port and sends passkey to start download process

PeerB starts data transferring process

Passkeyverificati

on

Yes

Yes No

No

No

Reject Connection

Page 31: Ali Kaplan alikapla@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 2/02/20091

Security in GridTorrent ClientOnly security port number on which Security

Manager listens is publicly known to other peersEach peer has to be authenticated and authorized

(A&A) before starting download processAfter a successful A&A, they receive data port

number and passkeyPeers use passkey for second verification just before

download processIf everything is valid and successful, actual data

downloading is started

2/02/2009 31

Page 32: Ali Kaplan alikapla@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 2/02/20091

Measurements and AnalysisThe set of benchmarks

PerformanceOverhead

Utilized PTCP transferring method for comparisonParallel streaming is one of the major performance

improvement methodsIt has similar structure with GridTorrent

Performed test-bed in these benchmarksLAN (Bloomington, IN-Indianapolis, IN)WAN (Bloomington, IN-Tallahassee, FL)

2/02/2009 32

Page 33: Ali Kaplan alikapla@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 2/02/20091

Modeling of PTCP and GridTorrentPTCP with 3 streams GridTorrent with 3 sources

2/02/2009 33

Page 34: Ali Kaplan alikapla@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 2/02/20091

LAN Test SetupPTCP GridTorrent

2/02/2009 34

Page 35: Ali Kaplan alikapla@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 2/02/20091

Theoretical and Practical LimitsRTT = 0.30 msTheoretical Bandwidth = 1000 MbpsMaximum TCP Bandwidth = .9493*1000=949 Mbps

Ethernet’s Maximum Transmission Unit = 1500 ByteTCP’s Header = 20 ByteIP’s Header =20 ByteEthernet’s additional preamble = 38 ByteU=(1500-20-20)/(1500+38)=0.94928

Measured Bandwidth with Iperf = 857 MbpsServer side: Iperf -s -w 256k Client side: Iperf -c <hostname> -w 512k -P 50http://www.noc.ucf.edu/Tools/Iperf/

2/02/2009 35

Page 36: Ali Kaplan alikapla@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 2/02/20091

LAN Test Result (RTT = 0.30 ms)

0.00

10.00

20.00

30.00

40.00

50.00

60.00

70.00

80.00

90.00

100.00

0 2 4 6 8 10 12 14 16

Number of Streams/Sources

Thro

ughput (M

bps)

PTCP GTorrent

2/02/2009 36

Page 37: Ali Kaplan alikapla@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 2/02/20091

WAN Test-I SetupPTCP GridTorrent with regular socket

2/02/2009 37

Page 38: Ali Kaplan alikapla@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 2/02/20091

Theoretical and Practical LimitsRTT = 50 msTheoretical Bandwidth = 1000 MbpsMaximum TCP Bandwidth = .9493*1000=949

MbpsMeasured Bandwidth with Iperf = 30.2 Mbps

Server side: Iperf -s -w 256k Client side: Iperf -c <hostname> -w 256k -P 50

2/02/2009 38

Page 39: Ali Kaplan alikapla@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 2/02/20091

WAN Test-I Result (RTT = 50 ms)

2/02/2009 39

0.00

10.00

20.00

30.00

40.00

50.00

60.00

70.00

80.00

90.00

100.00

110.00

120.00

0 2 4 6 8 10 12 14 16

Number of Streams/Sources

Thro

ughput (M

bps)

PTCP GTorrent

Page 40: Ali Kaplan alikapla@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 2/02/20091

WAN Test-II SetupPTCP GridTorrent with 4 parallel sockets

2/02/2009 40

Page 41: Ali Kaplan alikapla@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 2/02/20091

WAN Test-II Result (RTT = 50 ms)

2/02/2009 41

0.00

10.00

20.00

30.00

40.00

50.00

60.00

70.00

80.00

90.00

100.00

110.00

120.00

130.00

140.00

150.00

0 2 4 6 8 10 12 14 16

Number of Streams/Sources

Thro

ughput (M

bps)

PTCP GTorrent

Page 42: Ali Kaplan alikapla@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 2/02/20091

Evaluation of Test ResultsGridTorrent provides better or same performance on

WANPTCP reaches maximum data transfer speed at 15

streamsUtilizing PTCP in GridTorrent yields higher data transfer

rateTotal size of the overhead message is between 148-169

KB for transferring 300 MB fileScalability is not an issue due to bulk data transfer

characteristic

2/02/2009 42

Page 43: Ali Kaplan alikapla@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 2/02/20091

Characteristics of Participation in Scientific CommunityNumber of participator is scale of 10,100, 1000sFully distributedTeam workCERN: The European Organization for Nuclear Research

The world's largest particle physics laboratorySupported by twenty European member statesCurrently the workplace of approximately 2,600 full-time

employeesSome 7,931 scientists and engineers representing 580 universities and research facilities80 nationalities

2/02/2009 43

Page 44: Ali Kaplan alikapla@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 2/02/20091

Advantages of GridTorrentMore peers, more available servicesUnlike client/server model, mitigate loads on server with

more peersOptimal resources usage

Computing powerStorage spaceBandwidth

Very efficient for replica systemsP2P networks are more scalable than client/server modelReliable file transfer

Resume capability when data transfer interruptedThird-party transfer Disk allocation before actual data transfer

2/02/2009 44

Page 45: Ali Kaplan alikapla@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 2/02/20091

2/02/2009 45

Page 46: Ali Kaplan alikapla@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 2/02/20091

Transmission sequence matrix of PTCPTime (sec) S-C1 S-C2 S-C3 C1 C2 C3

1 N1 N1

2 N2 N1,N2

3 N3 N1,N2,N3

4 N1 N1

5 N2 N1,N2

6 N3 N1,N2,N3

7 N1 N1

8 N2 N1,N2

9 N3 N1,N2,N3

2/02/2009 46

Page 47: Ali Kaplan alikapla@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 2/02/20091

Transmission sequence matrix of GridTorrent

Time (sec) S-C1 S-C2 S-C3 C1-C2 C2-C3 C1-C3 C1 C2 C3

1 N1 N1

2 N2 N1 N1 N2 N1

3 N3 N1 N1 N1,N2 N1,N3

4 N2 N2 N1,N2 N1,N2 N1,N2,N3

5 N3 N3 N1,N2,N3 N1,N2,N3

2/02/2009 47

Page 48: Ali Kaplan alikapla@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 2/02/20091

ContributionsSystem research

A Collaborative framework with P2P based data moving technique Efficient, scalable and modular Integrating with SOA to increase modularity, flexibility and

extensibility Strategies for increasing performance and scalability Unification of many useful techniques such as reliable file transfer,

third-party transfer and disk allocation in a simple but efficient way Benchmarks to evaluate the GridTorrent performance

System software Designing and implementing a infrastructure consists of

GridTorrent client, WS-Tracker service, and Collaborative framework

2/02/2009 48

Page 49: Ali Kaplan alikapla@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 2/02/20091

Future WorksUtilizing other high-performance low-level TCP or UDP

based data transfer protocols in data layerImproving existing P2P techniqueCertification handling service for different certificatesAdapting existing system to support dynamic (real-time)

contentDeveloping and deploying Intelligent source selection

algorithm into WS-Tracker ServiceSecurity

Security framework for WS-Tracker Service if necessaryTransforming Collaborative framework into portlets for

reusability2/02/2009 49

Page 50: Ali Kaplan alikapla@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 2/02/20091

References1. Petascale computational systems, Bell, G.; Gray, J.;

Szalay, A. Computer Volume 39, Issue 1, Jan. 2006 Page(s): 110 – 112

2. Getting Up To Speed, The Future of Supercomputing, Graham, S.L. Snir, M., Patterson, C.A., (eds), NAE Press, 2004, ISBN 0-309-09502-6

3. Overview of Grid Computing, Ian Foster, http://www-fp.mcs.anl.gov/~foster/Talks/ResearchLibraryGroupGridsApril2002.ppt, last seen 2007

4. Science-Driven Network Requirements for Esnet, http:// www.es.net/ESnet4/Case-Study-Requirements-Update-With-Exec-Sum-v5.doc, last seen 2007

Page 51: Ali Kaplan alikapla@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 2/02/20091

2/02/2009 51

Create MyFile.torrent

MyFile.torrent

Page 52: Ali Kaplan alikapla@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 2/02/20091

2/02/2009 52

Upload MyFile.torrent

MyFile.torrent

Page 53: Ali Kaplan alikapla@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 2/02/20091

2/02/2009 53

Join to Tracker

MyFile.torrent

Page 54: Ali Kaplan alikapla@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 2/02/20091

2/02/2009 54

Find and obtain MyFile.torrent

MyFile.torrent

Page 55: Ali Kaplan alikapla@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 2/02/20091

2/02/2009 55

Join Tracker Node

MyFile.torrent

MyFile.torrent

Page 56: Ali Kaplan alikapla@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 2/02/20091

2/02/2009 56

Tracker Node replieswith list of peers = {Seed Node}

MyFile.torrent

MyFile.torrent

Page 57: Ali Kaplan alikapla@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 2/02/20091

2/02/2009 57

Download pieces of content

MyFile.torrent

MyFile.torrent

MyFile.torrent