Open Science Grid For Virtual Cell

Preview:

DESCRIPTION

Presentation given by Prasanna Gautam on completion of UCHC Summer internship at Center for Cell Analysis and Modeling. The work involved figuring out how to run virtual cell programs on Open Science Grid sites.

Citation preview

Bringing Open Science Bringing Open Science Grid to Virtual CellGrid to Virtual Cell

Prasanna GautamTrinity College ‘11

GoalsGoalsUnderstand how Virtual Cell

deploys jobsUnderstand how OSG worksFigure out how to deploy and

monitor jobs on OSGFigure out how to do it without

breaking Virtual Cell/reinventing

Client

Compute Cluster

Simulation Worker Service

JMS Broker(SonicMQ)

Siumulation Data Service

Data Export Service

Database Service

Simulation Dispatch Service

Database(Oracle)

ConnectionManager

ServerManager

Database ServiceDatabase Service

Data Export ServiceData Export Service

Siumulation Data ServiceSiumulation

Data Service

Simulation Dispatch ServiceSimulation

Dispatch Service

Simulation Worker ServiceSimulation

Worker Service

Compiled Simulation JobsCompiled

Simulation JobsCompiled Simulation JobsCompiled

Simulation JobsCompiled Simulation JobsCompiled

Simulation JobsCompiled Simulation Jobs

Batch Scheduler(PBSPro)

StorageCluster

VCell Software ArchitectureVCell Software Architecture(web-based distributed client/server (web-based distributed client/server framework)framework)

Servers at CCAMVCell meets OSG

ScalabilityScalability200 nodes will not be enough in

foreseeable future for Virtual CellSolution?

◦Adding more machines? Doesn’t always scale, but it always adds

cost

◦Maybe we can get someone else to run our programs?

GridGrid• A common framework for running jobs

on remote computing nodes.• Terms

– Fabric – Underlying hardware infrastructure, networking

– Middleware – Software linking end-user applications and fabric

– Virtual Organization (VO) – Group of certified users employing grid technology

– Site – A computation or storage service accessible on the grid

– Gatekeeper – A point of entry to a site for submitting jobs and querying information

We want a grid, not tower of We want a grid, not tower of Babel!Babel!

Open Science GridOpen Science GridStarted in 2004 (fairly new)Mostly Linux – 32 bit machinesCommon middleware (VDT)Common Authentication (GSI) – based

on Public Key Infrastructure (PKI) Common API for running jobs (Globus)File Transfer protocols (GridFTP)Common high level communication

protocols (WSRF)

Client

Compute Cluster

Simulation Worker Service

JMS Broker(SonicMQ)

Siumulation Data Service

Data Export Service

Database Service

Simulation Dispatch Service

Database(Oracle)

ConnectionManager

ServerManager

Database ServiceDatabase Service

Data Export ServiceData Export

Service

Siumulation Data ServiceSiumulation Data

Service

Simulation Dispatch ServiceSimulation

Dispatch Service

Simulation Worker ServiceSimulation

Worker Service

Compiled Simulation JobsCompiled

Simulation JobsCompiled Simulation JobsCompiled

Simulation JobsCompiled Simulation JobsCompiled

Simulation JobsCompiled Simulation Jobs

Batch Scheduler(PBSPro)

StorageCluster

VCell meets OSGVCell meets OSG

Servers at CCAM

OSG Services

OSG

OSG Web

service

Outside Firewall

VCell Architecture

My ProjectMy Project

VCell meets OSG

Overall structureOverall structureA light central server that “listens”

for everything.◦Runs on vdtclient2 (outside the firewalls,

so jobs can provide feedback)◦Listens for changes in the supporting

sites◦Platform for remote and internal jobs to

communicate.◦Gives a point of

administration/monitoring for OSG part of VCell

VCell meets OSG My Project

Overall structureOverall structureServices that can be spawned by

PBS (Portable Batch System) that Vcell uses◦Used to

Search for sites Notify Listener Submit Jobs Monitor Jobs

◦Should be able to run on existing cluster A lot of extra dependencies that I’m trying to

minimize

VCell meets OSG My Project

Scavenging for sitesScavenging for sitesFew Existing tools

◦MyOSG A website for giving summary for

resources

◦VORS A website for getting information Extremely useful but getting rid of by the

end of summer

◦LDAP query to BDII server at is.grid.iu.edu Glue schema

VCell meets OSG My Project

VCell meets OSG My Project

Matching with sitesMatching with sitesTwo main ways

◦Using Condor ClassAds◦Running standard jobs and ranking

sites based on them

VCell meets OSG My Project

Condor ClassAdsCondor ClassAdsThink of Classified Adverts in

newspapers◦A service provider (Compute Element,

Service Element in this case), tells what it has

◦A client (us in this case) ask for what it wants and we try to match a suitable site.

◦Easier, but not very reliable Our requirements are fairly static A significant rework to get to work on current

systemVCell meets OSG My Project

ExamplesExamplescondor_status -const

'KeyboardIdle > 20*60 && Memory > 100' ◦Returns computers that have been

idle for more than 20 minutes and have more than 100 MB of memory

VCell meets OSG My Project

ExamplesExamples

VCell meets OSG My Project

Using HeuristicsUsing HeuristicsRunning jobs and ranking sitesSend small jobsProfile themOver time we’ll have a good

understanding of our portion of grid

Definitely harderBut, we can be smarter about

deploying jobsVCell meets OSG My Project

ExampleExample

Job Count chart for BNL_ATLAS_1

Source: GratiaVCell meets OSG My Project

Running JobsRunning JobsTwo major ways

◦Condor-G◦Globus Toolkit

VCell meets OSG My Project

Condor-GCondor-GSubmit directly to condor pool on

remote site◦Doesn’t always work◦In our case, we use PBS

Condor ClassAds take care of this but a little work upfront

◦I tried to use OSG Matchmaker and let it sort things.

VCell meets OSG My Project

Globus ToolkitGlobus ToolkitProvides a middleware for

deploying and monitoring JobsProvides Java, C and Python APIs

◦jGlobus makes sense to deploy◦Better integration with existing

codebaseWe can design complete

workflows using these tools

VCell meets OSG My Project

What I’d like to doWhat I’d like to doSelect a siteStart a jobAttach a listener that polls as a

PBS job for changesPull incremental progresses as

the job is runningKeep a transactional status of the

job on the Oracle Database

VCell meets OSG My Project

What I was almost able to What I was almost able to dodoSelect a site using Condor

Matchmaker◦It seems to select Harvard SBGrid

almost all the time◦So, I’m taking a random site for testing

Record the URL Globus Gatekeeper provides in MySQL

Poll for status and wait for DONE signal

If it is done, pull the outputVCell meets OSG My Project

ConclusionConclusionIt really is feasible to run jobs like

Virtual Cell on OSGJust not in 10 weeks from start to

finishOSG is an evolving system

◦Our decisions have to be flexibleThere are a lot of architecture

decisions we need to resolve.

VCell meets OSG My Project

FutureFutureContinuity of the projectBeing able to run from Vcell

clusterKeeping a dynamic view of the

grid as a wholeFeedback to the user

VCell meets OSG My Project

AcknowledgmentsAcknowledgmentsDr. Ion MoraruJeff DuttonJames SchaffDr. Greg HuberMats Rynge (RENCI)Arvind Gopu (Indiana University,

OSG-GOC)Peter Doherty (Harvard SBGrid)

Recommended