40
Digital Sherpa: Custom Grid Digital Sherpa: Custom Grid Applications on the TeraGrid and Applications on the TeraGrid and Beyond Beyond GGF18 / GridWorld 2006 GGF18 / GridWorld 2006 Ronald C. Price, Victor E. Bazterra, Wayne Bradford, Julio C. Ronald C. Price, Victor E. Bazterra, Wayne Bradford, Julio C. Facelli Facelli Center for High Performance Computing at the University of Utah Center for High Performance Computing at the University of Utah Partially funded by NSF ITR award # Partially funded by NSF ITR award # 0326027 0326027

Ron Price Grid World Presentation

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Ron Price Grid World Presentation

Digital Sherpa: Custom Grid Applications on Digital Sherpa: Custom Grid Applications on

the TeraGrid and Beyondthe TeraGrid and Beyond GGF18 / GridWorld 2006GGF18 / GridWorld 2006

Ronald C. Price, Victor E. Bazterra, Wayne Bradford, Julio C. FacelliRonald C. Price, Victor E. Bazterra, Wayne Bradford, Julio C. FacelliCenter for High Performance Computing at the University of UtahCenter for High Performance Computing at the University of Utah

Partially funded by NSF ITR award #Partially funded by NSF ITR award #03260270326027

Page 2: Ron Price Grid World Presentation

First Things FirstFirst Things First

HAPPY BIRTHDAY GLOBUS!!!HAPPY BIRTHDAY GLOBUS!!!

Page 3: Ron Price Grid World Presentation

Roles & AcknowledgmentsRoles & Acknowledgments

Ron: Grid Architect and Software EngineerRon: Grid Architect and Software EngineerVictor: Research Scientist & Grid Victor: Research Scientist & Grid Researcher, user of many HPC ResourcesResearcher, user of many HPC ResourcesWayne: Grid Sys AdminWayne: Grid Sys AdminJulio: DirectorJulio: DirectorGlobus Mailing list and especially the Globus Mailing list and especially the Globus AllianceGlobus AllianceEntire Center for High Performance Entire Center for High Performance Computing University of Utah StaffComputing University of Utah Staff

Page 4: Ron Price Grid World Presentation

OverviewOverview

Problem & SolutionProblem & Solution– general problemgeneral problem– SolutionSolution– traditional approaches traditional approaches

PastPast– sys admin caveats (briefly)sys admin caveats (briefly)– concepts and implementation concepts and implementation

PresentPresent– examplesexamples– applicationsapplications

FutureFuture– applicationsapplications– featuresfeatures

Page 5: Ron Price Grid World Presentation

General Problem & General Problem & SolutionSolution

General Problem: General Problem: – Many High Performance Computing (HPC) scientific Many High Performance Computing (HPC) scientific

projects require large number of loosely coupled projects require large number of loosely coupled executions in numerous HPC resources which can executions in numerous HPC resources which can not be managed manually.not be managed manually.

Solution (Digital Sherpa): Solution (Digital Sherpa): – Distribute the jobs of HPC scientific applications Distribute the jobs of HPC scientific applications

across a grid allowing access to more resources with across a grid allowing access to more resources with automatic staging, jobs submission, monitoring, fault automatic staging, jobs submission, monitoring, fault recovery and efficiency improvement.recovery and efficiency improvement.

Page 6: Ron Price Grid World Presentation

Traditional Approach:Traditional Approach:“babysitter” scripts“babysitter” scripts

““babysitter” scripts are common but in babysitter” scripts are common but in general they have some problems:general they have some problems:– not scalable (written to work with a specific not scalable (written to work with a specific

scheduler)scheduler)– Hard to maintain (typically a hack)Hard to maintain (typically a hack)– not portable (system specific)not portable (system specific)

Page 7: Ron Price Grid World Presentation

Digital Sherpa & PerspectiveDigital Sherpa & Perspective

A different perspective:A different perspective:– Schedulers: System Oriented PerspectiveSchedulers: System Oriented Perspective

Many jobs on one HPC resource, user doesn’t Many jobs on one HPC resource, user doesn’t have controlhave control

– Sherpa: User Oriented perspectiveSherpa: User Oriented perspectiveMany jobs on many resource, user has controlMany jobs on many resource, user has control

Page 8: Ron Price Grid World Presentation

Digital Sherpa In GeneralDigital Sherpa In General

Digital Sherpa is a grid application for executing HPC Digital Sherpa is a grid application for executing HPC applications across many grid enabled HPC resources. applications across many grid enabled HPC resources. It automates non-scalable tasks such as staging, job It automates non-scalable tasks such as staging, job submission and monitoring, including recovery features submission and monitoring, including recovery features such as resubmission of failed jobs. such as resubmission of failed jobs. The goal is to allow any HPC application to easily The goal is to allow any HPC application to easily interoperate with Digital Sherpa to become a custom grid interoperate with Digital Sherpa to become a custom grid application. application. Distributing the jobs across HPC resources increases the Distributing the jobs across HPC resources increases the amount of computer resources that can be accessed at a amount of computer resources that can be accessed at a given time. given time. Success using Digital Sherpa has been found on the Success using Digital Sherpa has been found on the TeraGrid and there are many more applications of Digital TeraGrid and there are many more applications of Digital Sherpa in progress.Sherpa in progress.

Page 9: Ron Price Grid World Presentation

So, what is Digital Sherpa?So, what is Digital Sherpa?

Naming Convention for rest of Slides: Digital Sherpa = Naming Convention for rest of Slides: Digital Sherpa = SherpaSherpaSherpa is a multi threaded custom extension of the GT4 Sherpa is a multi threaded custom extension of the GT4 WS-GRAM client. WS-GRAM client. Sherpa has been designed and planned to be scalable, Sherpa has been designed and planned to be scalable, maintainable and used directly by people or other maintainable and used directly by people or other applications.applications. It is based on Web Services Resource Framework It is based on Web Services Resource Framework (WSRF) and it is implemented in Java 1.5 using the (WSRF) and it is implemented in Java 1.5 using the Globus Toolkit 4.0 (GT4). Globus Toolkit 4.0 (GT4). Sherpa has the ability to do a complete HPC submission Sherpa has the ability to do a complete HPC submission (stage data in, run/monitor PBS job, stage data out and (stage data in, run/monitor PBS job, stage data out and auto restart of failed jobs, improve efficiency)auto restart of failed jobs, improve efficiency)

Page 10: Ron Price Grid World Presentation

Why the name Sherpa?Why the name Sherpa?

Digital Sherpa takes its name from Digital Sherpa takes its name from “sherpa” who are known for their great “sherpa” who are known for their great mountaineering skills in the Himalayas, mountaineering skills in the Himalayas, expert route finders and porters. expert route finders and porters. – find the route for you (find an HPC resource find the route for you (find an HPC resource

for your needs, for your needs, future featurefuture feature ))– carry gear in for you (stage data in) carry gear in for you (stage data in) – climb to the top (execute job and restart job if climb to the top (execute job and restart job if

necessary) necessary) – and carry gear out for you (stage data out). and carry gear out for you (stage data out).

Page 11: Ron Price Grid World Presentation

Benefits and SignificanceBenefits and Significance

Benefits:Benefits:– Automation of login, data stage in and stage out, job submission, Automation of login, data stage in and stage out, job submission,

monitoring, and auto restart if the job fails, efficiency monitoring, and auto restart if the job fails, efficiency improvementimprovement

– Distribute your jobs across various HPC resources to increase Distribute your jobs across various HPC resources to increase the amount of resources that can be used at a time. the amount of resources that can be used at a time.

– Reduction of queue wait time by submitting jobs to several Reduction of queue wait time by submitting jobs to several queues resulting in an increase of efficiencyqueues resulting in an increase of efficiency

– Load balancing from increased granularityLoad balancing from increased granularity– Can be called from a separate applicationCan be called from a separate application

Significance:Significance:– Automates the flow of large number of jobs within grid Automates the flow of large number of jobs within grid

environmentsenvironments– Increases throughput of HPC Scientific ApplicationsIncreases throughput of HPC Scientific Applications

Page 12: Ron Price Grid World Presentation

Globus Toolkit 4Globus Toolkit 4

The GlobusThe Globus Toolkit is an open source software toolkit Toolkit is an open source software toolkit used for building Grid systems and applications used for building Grid systems and applications Globus Toolkit 4.0.x (GT4) is the most recent releaseGlobus Toolkit 4.0.x (GT4) is the most recent releaseGT4 is best thought of as a Grid Development Kit (GDK)GT4 is best thought of as a Grid Development Kit (GDK)GT4 has four main components:GT4 has four main components:– Grid Security Infrastructure (GSI)Grid Security Infrastructure (GSI)– Reliable File Transfer (RFT)Reliable File Transfer (RFT)– Web Services - Monitoring and Discovery Service (WS-MDS)Web Services - Monitoring and Discovery Service (WS-MDS)– Web Services – Grid Resource Allocation Management (WS-Web Services – Grid Resource Allocation Management (WS-

GRAM)GRAM)

Page 13: Ron Price Grid World Presentation

Sherpa RequirementsSherpa Requirements

Globus Tookit 4:Globus Tookit 4:– Dependent GT4 Components:Dependent GT4 Components:

– WS-GRAM (Execution Management)WS-GRAM (Execution Management)– RFT (Data Management)RFT (Data Management)

Java 1.5Java 1.5

Page 14: Ron Price Grid World Presentation

Past: Sys Admin CaveatsPast: Sys Admin Caveats

Did a lot of initial testing and configurationDid a lot of initial testing and configuration

Build notes:Build notes:– http://wiki.chpc.utah.edu/index.php/System_Ahttp://wiki.chpc.utah.edu/index.php/System_A

dministration_and_GT4:_An_Addendum_to_tdministration_and_GT4:_An_Addendum_to_the_Globus_Alliance_Quick_Start_Guidehe_Globus_Alliance_Quick_Start_Guide

– GT 4.0.2 doesn’t require postgres configGT 4.0.2 doesn’t require postgres config

Page 15: Ron Price Grid World Presentation

Motivations for Motivations for Creating SherpaCreating Sherpa

Reasons for Creating Digital Sherpa, Reasons for Creating Digital Sherpa, Motivations:Motivations:– Allow scientists to be scientists in their own fields, Allow scientists to be scientists in their own fields,

don’t force them to become computer scientistsdon’t force them to become computer scientists– Eliminate error prone time consuming non-scalable Eliminate error prone time consuming non-scalable

tasks of: job submission, monitoring, data stagingtasks of: job submission, monitoring, data staging– Allow easy access to more resourcesAllow easy access to more resources– Reduce total queue timeReduce total queue time– Increase efficiencyIncrease efficiency

Page 16: Ron Price Grid World Presentation

Before Sherpa: BabySitterBefore Sherpa: BabySitter

BabySitterBabySitter– before GT4before GT4

Conceptual details of BabySitterConceptual details of BabySitter– Resource manager and handlerResource manager and handler– Proprietary states similar to the external states of the Proprietary states similar to the external states of the

managed job services in WS-GRAM managed job services in WS-GRAM – Not a general solution, scheduler specificNot a general solution, scheduler specific

Took GT4 into the lab as it became availableTook GT4 into the lab as it became available

Page 17: Ron Price Grid World Presentation

Sherpa ConceptuallySherpa ConceptuallyPast and Present: StatesPast and Present: States

Past:Past:– Null, idle, running, doneNull, idle, running, done

Realized Globus Alliance had already Realized Globus Alliance had already defined the states as GT4 was finalizeddefined the states as GT4 was finalizedPresent: external states of the managed Present: external states of the managed job services in WS-GRAM job services in WS-GRAM – Unsubmitted, StageIn, Pending, Active, Unsubmitted, StageIn, Pending, Active,

Suspended, StageOut, CleanUp, Done, FailedSuspended, StageOut, CleanUp, Done, Failed

Page 18: Ron Price Grid World Presentation

Digital Sherpa Implementation: Digital Sherpa Implementation: Choice of API, Past and PresentChoice of API, Past and Present

Past: babysitter Past: babysitter – Java app using J2SSH to login to HPC resource and Java app using J2SSH to login to HPC resource and

then query the output from the schedulerthen query the output from the scheduler

Present: GT4 GDKPresent: GT4 GDK– WS-GRAM APIWS-GRAM API– when I wrote the Sherpa code JavaCOG and GAT did when I wrote the Sherpa code JavaCOG and GAT did

not work with GT4 and I needed GT4not work with GT4 and I needed GT4

WS-GRAM hides scheduler specific complexitiesWS-GRAM hides scheduler specific complexities

Page 19: Ron Price Grid World Presentation

The “BLAH” Example: The “BLAH” Example: Test JobsTest Jobs

A test case for Sherpa: ***_blah.xml A test case for Sherpa: ***_blah.xml corresponds to ***_blah.out and corresponds to ***_blah.out and ***_blahblah.xml corresponds to blahblah.out …***_blahblah.xml corresponds to blahblah.out …Stage In: Stage In: – Local blahsrc.txt -> remote RFT server blah.txtLocal blahsrc.txt -> remote RFT server blah.txt

Run: Run: – /bin/more blah.txt (std out to: blahtemp.out)/bin/more blah.txt (std out to: blahtemp.out)

Stage Out:Stage Out:– Remote RFT serverblahtemp.out -> local blah.outRemote RFT serverblahtemp.out -> local blah.out

Clean Up:Clean Up:– deletes blahtemp.out at remote HPC resourcedeletes blahtemp.out at remote HPC resource

Page 20: Ron Price Grid World Presentation

Sherpa Input FileSherpa Input File

Made use of the WS-GRAM XML SchemaMade use of the WS-GRAM XML Schema

Example: argonne_blah.xmlExample: argonne_blah.xml– File walk throughFile walk through

Page 21: Ron Price Grid World Presentation

““BLAH” on TeraGrid:BLAH” on TeraGrid:Sherpa in ActionSherpa in Action

-bash-3.00$ java -DGLOBUS_LOCATION=$GLOBUS_LOCATION Sherpa argonne_blah.xml -bash-3.00$ java -DGLOBUS_LOCATION=$GLOBUS_LOCATION Sherpa argonne_blah.xml purdue_blahblahblah.xml ncsamercury_blahblah.xmlpurdue_blahblahblah.xml ncsamercury_blahblah.xmlStarting job in: argonne_blah.xmlStarting job in: argonne_blah.xmlHandler 1 Starting...argonne_blah.xmlHandler 1 Starting...argonne_blah.xmlStarting job in: purdue_blahblahblah.xmlStarting job in: purdue_blahblahblah.xmlHandler 2 Starting...purdue_blahblahblah.xmlHandler 2 Starting...purdue_blahblahblah.xmlStarting job in: ncsamercury_blahblah.xmlStarting job in: ncsamercury_blahblah.xmlHandler 3 Starting...ncsamercury_blahblah.xmlHandler 3 Starting...ncsamercury_blahblah.xmlHandler 3: StageInHandler 3: StageInHandler 2: StageInHandler 2: StageInHandler 1: StageInHandler 1: StageInHandler 3: PendingHandler 3: PendingHandler 1: PendingHandler 1: PendingHandler 2: PendingHandler 2: PendingHandler 2: ActiveHandler 2: ActiveHandler 2: StageOutHandler 2: StageOutHandler 1: ActiveHandler 1: ActiveHandler 2: CleanUpHandler 2: CleanUpHandler 2: DoneHandler 2: DoneHandler 2 Complete.Handler 2 Complete.Handler 3: ActiveHandler 3: ActiveHandler 1: StageOutHandler 1: StageOutHandler 3: StageOutHandler 3: StageOutHandler 1: CleanUpHandler 1: CleanUpHandler 3: CleanUpHandler 3: CleanUpHandler 1: DoneHandler 1: DoneHandler 1 Complete.Handler 1 Complete.Handler 3: DoneHandler 3: DoneHandler 3 Complete.Handler 3 Complete.-bash-3.00$ hostname -f-bash-3.00$ hostname -fwatchman.chpc.utah.eduwatchman.chpc.utah.edu

Page 22: Ron Price Grid World Presentation

Sherpa Purdue Sherpa Purdue Test ResultsTest Results

-bash-3.00$ more *.out-bash-3.00$ more *.out::::::::::::::::::::::::::::blahblahblah.outblahblahblah.out::::::::::::::::::::::::::::BLAH BLAH BLAHBLAH BLAH BLAH

No PBS epilogue or prologueNo PBS epilogue or prologue

Page 23: Ron Price Grid World Presentation

Sherpa NCSA MercurySherpa NCSA MercuryResultsResults

::::::::::::::::::::::::::::blahblah.outblahblah.out::::::::::::::::::::::::::::--------------------------------------------------------------------------------Begin PBS Prologue Thu Apr 27 13:17:09 CDT 2006Begin PBS Prologue Thu Apr 27 13:17:09 CDT 2006Job ID:         Job ID:         612149.tg-master.ncsa.teragrid.org612149.tg-master.ncsa.teragrid.orgUsername:       priceUsername:       priceGroup:          oorGroup:          oorNodes:          tg-c421Nodes:          tg-c421End PBS Prologue Thu Apr 27 13:17:13 CDT 2006End PBS Prologue Thu Apr 27 13:17:13 CDT 2006--------------------------------------------------------------------------------BLAH BLAHBLAH BLAH--------------------------------------------------------------------------------Begin PBS Epilogue Thu Apr 27 13:17:20 CDT 2006Begin PBS Epilogue Thu Apr 27 13:17:20 CDT 2006Job ID:         Job ID:         612149.tg-master.ncsa.teragrid.org612149.tg-master.ncsa.teragrid.orgUsername:       priceUsername:       priceGroup:          oorGroup:          oorJob Name:       STDINJob Name:       STDINSession:        4042Session:        4042Limits:         ncpus=1,nodes=1,walltime=00:10:00Limits:         ncpus=1,nodes=1,walltime=00:10:00Resources:      cput=00:00:01,mem=0kb,vmem=0kb,walltime=00:00:06Resources:      cput=00:00:01,mem=0kb,vmem=0kb,walltime=00:00:06Queue:          dqueQueue:          dqueAccount:                mudAccount:                mudNodes:          tg-c421Nodes:          tg-c421

Killing leftovers...Killing leftovers...

End PBS Epilogue Thu Apr 27 13:17:24 CDT 2006End PBS Epilogue Thu Apr 27 13:17:24 CDT 2006---------------------------------------- ----------------------------------------

Page 24: Ron Price Grid World Presentation

Sherpa UC/ANL TestSherpa UC/ANL TestResultsResults

::::::::::::::::::::::::::::blah.outblah.out::::::::::::::::::::::::::::--------------------------------------------------------------------------------Begin PBS Prologue Thu Apr 27 13:16:53 CDT 2006Begin PBS Prologue Thu Apr 27 13:16:53 CDT 2006Job ID:         Job ID:         251168.tg-master.uc.teragrid.org251168.tg-master.uc.teragrid.orgUsername:       rpriceUsername:       rpriceGroup:          allocateGroup:          allocateNodes:          tg-c061Nodes:          tg-c061End PBS Prologue Thu Apr 27 13:16:54 CDT 2006End PBS Prologue Thu Apr 27 13:16:54 CDT 2006--------------------------------------------------------------------------------BLAHBLAH--------------------------------------------------------------------------------Begin PBS Epilogue Thu Apr 27 13:17:00 CDT 2006Begin PBS Epilogue Thu Apr 27 13:17:00 CDT 2006Job ID:         Job ID:         251168.tg-master.uc.teragrid.org251168.tg-master.uc.teragrid.orgUsername:       rpriceUsername:       rpriceGroup:          allocateGroup:          allocateJob Name:       STDINJob Name:       STDINSession:        11367Session:        11367Limits:         nodes=1,walltime=00:15:00Limits:         nodes=1,walltime=00:15:00Resources:      cput=00:00:01,mem=0kb,vmem=0kb,walltime=00:00:02Resources:      cput=00:00:01,mem=0kb,vmem=0kb,walltime=00:00:02Queue:          dqueQueue:          dqueAccount:                TG-MCA01S027Account:                TG-MCA01S027Nodes:          tg-c061Nodes:          tg-c061

Killing leftovers...Killing leftovers...

End PBS Epilogue Thu Apr 27 13:17:16 CDT 2006End PBS Epilogue Thu Apr 27 13:17:16 CDT 2006---------------------------------------- ----------------------------------------

Page 25: Ron Price Grid World Presentation

MGAC BackgroundMGAC Background

Modified Genetic Algorithms for Crystals Modified Genetic Algorithms for Crystals and Atomic Clusters (MGAC), an HPC and Atomic Clusters (MGAC), an HPC chemistry application written in C++chemistry application written in C++In short based off of an energy criteria In short based off of an energy criteria MGAC tries to predict the chemical MGAC tries to predict the chemical structurestructureComputing Needs: local serial Computing Needs: local serial computations and distributed parallel computations and distributed parallel computationscomputations

Page 26: Ron Price Grid World Presentation

MGAC & Circular FlowMGAC & Circular Flow

Page 27: Ron Price Grid World Presentation

MGAC-CGA: Real ScienceMGAC-CGA: Real Science

Page 28: Ron Price Grid World Presentation

Efficiency and Efficiency and HPC ResourcesHPC Resources

Scheduler Side Effect:Scheduler Side Effect:– 1 job submitted requiring 5 calculations 1 job submitted requiring 5 calculations

4 calculatons require 1 hour of compute time each4 calculatons require 1 hour of compute time each

1 calculation requires 10 hours of compute time1 calculation requires 10 hours of compute time

The other 4 nodes are still reserved although not The other 4 nodes are still reserved although not being used and they can’t be used by anyone else being used and they can’t be used by anyone else until the 10hr job has finished; 4*9 = 36hrs of until the 10hr job has finished; 4*9 = 36hrs of wasted compute timewasted compute time

Page 29: Ron Price Grid World Presentation

Minimization & Waste Chart:Minimization & Waste Chart:MGACMGAC

Page 30: Ron Price Grid World Presentation

Minimization & Use Chart:Minimization & Use Chart:MGAC-CGAMGAC-CGA

Page 31: Ron Price Grid World Presentation

Efficiency and Efficiency and HPC ResourcesHPC Resources

Guesstimate: in one common MGAC run our average Guesstimate: in one common MGAC run our average efficiency due to scheduler side effect is: 46%, efficiency due to scheduler side effect is: 46%, – 54% or resources are wasted54% or resources are wasted

Sherpa continuously submits one job at a time which Sherpa continuously submits one job at a time which reduces the scheduler side effect because multiple reduces the scheduler side effect because multiple schedulers are involved and jobs are submitted in a schedulers are involved and jobs are submitted in a more granular fashionmore granular fashion– Improved Efficiency #1: increased granularity Improved Efficiency #1: increased granularity

Necessary sharing policies prohibit large number of jobs Necessary sharing policies prohibit large number of jobs from being submitted all at one HPC resource, queue from being submitted all at one HPC resource, queue times become to longtimes become to long– Improved Efficiency #2: access to more resourcesImproved Efficiency #2: access to more resources

Guesstimate: total computational time (including queue Guesstimate: total computational time (including queue time) reduced by 89%-60% in our initial testing.time) reduced by 89%-60% in our initial testing.

Page 32: Ron Price Grid World Presentation

Sherpa PerformanceSherpa Performance& Load Capability& Load Capability

Performance:Performance:– Sherpa is light weight, computationally intensive operations are Sherpa is light weight, computationally intensive operations are

done at HPC resourcedone at HPC resource– Memory intensiveMemory intensive

Load Capability:Load Capability:– Hard to create a huge test case, need unique file namesHard to create a huge test case, need unique file names– Ran out of file handles around 100,000 jobs without any HPC Ran out of file handles around 100,000 jobs without any HPC

submission ( turned out system image software was submission ( turned out system image software was misconfigured )misconfigured )

– Successfully initiated 500 jobsSuccessfully initiated 500 jobsEmphasis on initiated, 500 jobs appeared in the test queue and Emphasis on initiated, 500 jobs appeared in the test queue and although many ran to completion we did not have time to let them all although many ran to completion we did not have time to let them all run to completionrun to completion

Page 33: Ron Price Grid World Presentation

Host Cert and SherpaHost Cert and Sherpa

Globus GSI:Globus GSI:– Uses PKI to verify that users and hosts are who they Uses PKI to verify that users and hosts are who they

claim to be, creates trustclaim to be, creates trust– User certs and host certs are different and they User certs and host certs are different and they

provide different functionalityprovide different functionality

Sherpa Requires a Globus host certificate Sherpa Requires a Globus host certificate ORNL granted us oneORNL granted us onePolicy changed: got CRLdPolicy changed: got CRLdConfusion: Either WS-GRAM or RFT was Confusion: Either WS-GRAM or RFT was requiring a valid host certrequiring a valid host cert– Had to know if there was a way around the situationHad to know if there was a way around the situation– Did some testing to investigate and trouble shoot Did some testing to investigate and trouble shoot

Page 34: Ron Price Grid World Presentation

Testing/Trouble ShootingTesting/Trouble Shooting

hostRFT

ServiceSource Destination Result

CHPC CHPC CHPCUC/ANL, NCSA,

Indianaok

CHPC CHPC UC/ANL, Indiana CHPC ok

CHPC UC/ANL CHPC, UC/ANL UC/ANL, CHPCfailed

(unknown ca)

CHPCUC/

ANL ,Indiana

UC/ANL, Indiana Indiana, UC/ANLfailed

(unknown ca)

UC/ANL UC/ANL CHPC UC/ANLfailed

(unknown ca)

Page 35: Ron Price Grid World Presentation

TeraGrid CA CaveatsTeraGrid CA Caveats

How do you allow your machines to fully interoperate How do you allow your machines to fully interoperate with the TeraGrid without a host cert from a trusted CA?with the TeraGrid without a host cert from a trusted CA?– Not Possible.Not Possible.

How do you get a host cert for the TeraGrid?How do you get a host cert for the TeraGrid?– From least scalable to most scalable:From least scalable to most scalable:

Work with site specific orgs to accept your CA's certs.  (tedious for Work with site specific orgs to accept your CA's certs.  (tedious for multiple sites) multiple sites) Get TeraGrid security working groups approval for Local University Get TeraGrid security working groups approval for Local University CA (time consuming, not EDU scalable) CA (time consuming, not EDU scalable) Get a TeraGrid trusted CA to issue you one.   (unlikely as site policy Get a TeraGrid trusted CA to issue you one.   (unlikely as site policy seems to contradict this) seems to contradict this) Become a TG member Become a TG member Side Note: A satisfactory scalable solution does not seem to be Side Note: A satisfactory scalable solution does not seem to be currently in place and it's our understanding that Shibboleth and/or currently in place and it's our understanding that Shibboleth and/or International Grid Trust Federation (IGTF) will eventually offer this International Grid Trust Federation (IGTF) will eventually offer this service for EDU's in the future.service for EDU's in the future.

Page 36: Ron Price Grid World Presentation

Not the End:Not the End:Sherpa is FlexibleSherpa is Flexible

Sherpa can work between any two Sherpa can work between any two machines that have GT4 installed and machines that have GT4 installed and configured:configured:

FlexibleFlexible

Can work in many locationsCan work in many locations

Implicitly follows open standardsImplicitly follows open standards

Page 37: Ron Price Grid World Presentation

Future ProjectsFuture Projects

MGAC-CGA is the first example, we have MGAC-CGA is the first example, we have other projects with Sherpa:other projects with Sherpa:– Nanotechnology simulation (web application)Nanotechnology simulation (web application)– Biomolecular docking (circular flow)Biomolecular docking (circular flow)

AKA: protein docking, drug discoveryAKA: protein docking, drug discovery

– Combustion simulation (web application)Combustion simulation (web application)

Page 38: Ron Price Grid World Presentation

Future Features and Future Features and ImplementationImplementation

Future efforts will be directed towards:Future efforts will be directed towards:– implementing monitoring and discovery client logicimplementing monitoring and discovery client logic– polling feature that will help identify when system polling feature that will help identify when system

related issues have occurred (i.e. network down, related issues have occurred (i.e. network down, scheduler unavailable) scheduler unavailable)

– Grid Proxy Auto Renewal.Grid Proxy Auto Renewal.

Implementation (move to a more general API)Implementation (move to a more general API)– Simple API for Grid Apps – Research Group (SAGA-Simple API for Grid Apps – Research Group (SAGA-

RG) RG) – Grid Application Toolkit (GAT)Grid Application Toolkit (GAT)– JavaCOGJavaCOG

Page 39: Ron Price Grid World Presentation

How do I get a Hold How do I get a Hold of Sherpa?of Sherpa?

We are interested in collaborative efforts.We are interested in collaborative efforts.

Sorry, can’t download Sherpa because we Sorry, can’t download Sherpa because we don’t have the man power for support right don’t have the man power for support right now.now.

Page 40: Ron Price Grid World Presentation

Q&A With AudienceQ&A With Audience

Mail Questions to: Mail Questions to: [email protected]@gmail.com

Slides Availble at: Slides Availble at: http://www.chpc.utah.edu/~rprice/grid_worlhttp://www.chpc.utah.edu/~rprice/grid_world_2006/ron_price_grid_world_presentatiod_2006/ron_price_grid_world_presentation.pptn.ppt