Upload
prasanna-gautam
View
508
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Presentation given by Prasanna Gautam on completion of UCHC Summer internship at Center for Cell Analysis and Modeling. The work involved figuring out how to run virtual cell programs on Open Science Grid sites.
Citation preview
Bringing Open Science Bringing Open Science Grid to Virtual CellGrid to Virtual Cell
Prasanna GautamTrinity College ‘11
GoalsGoalsUnderstand how Virtual Cell
deploys jobsUnderstand how OSG worksFigure out how to deploy and
monitor jobs on OSGFigure out how to do it without
breaking Virtual Cell/reinventing
Client
Compute Cluster
Simulation Worker Service
JMS Broker(SonicMQ)
Siumulation Data Service
Data Export Service
Database Service
Simulation Dispatch Service
Database(Oracle)
ConnectionManager
ServerManager
Database ServiceDatabase Service
Data Export ServiceData Export Service
Siumulation Data ServiceSiumulation
Data Service
Simulation Dispatch ServiceSimulation
Dispatch Service
Simulation Worker ServiceSimulation
Worker Service
Compiled Simulation JobsCompiled
Simulation JobsCompiled Simulation JobsCompiled
Simulation JobsCompiled Simulation JobsCompiled
Simulation JobsCompiled Simulation Jobs
Batch Scheduler(PBSPro)
StorageCluster
VCell Software ArchitectureVCell Software Architecture(web-based distributed client/server (web-based distributed client/server framework)framework)
Servers at CCAMVCell meets OSG
ScalabilityScalability200 nodes will not be enough in
foreseeable future for Virtual CellSolution?
◦Adding more machines? Doesn’t always scale, but it always adds
cost
◦Maybe we can get someone else to run our programs?
GridGrid• A common framework for running jobs
on remote computing nodes.• Terms
– Fabric – Underlying hardware infrastructure, networking
– Middleware – Software linking end-user applications and fabric
– Virtual Organization (VO) – Group of certified users employing grid technology
– Site – A computation or storage service accessible on the grid
– Gatekeeper – A point of entry to a site for submitting jobs and querying information
We want a grid, not tower of We want a grid, not tower of Babel!Babel!
Open Science GridOpen Science GridStarted in 2004 (fairly new)Mostly Linux – 32 bit machinesCommon middleware (VDT)Common Authentication (GSI) – based
on Public Key Infrastructure (PKI) Common API for running jobs (Globus)File Transfer protocols (GridFTP)Common high level communication
protocols (WSRF)
Client
Compute Cluster
Simulation Worker Service
JMS Broker(SonicMQ)
Siumulation Data Service
Data Export Service
Database Service
Simulation Dispatch Service
Database(Oracle)
ConnectionManager
ServerManager
Database ServiceDatabase Service
Data Export ServiceData Export
Service
Siumulation Data ServiceSiumulation Data
Service
Simulation Dispatch ServiceSimulation
Dispatch Service
Simulation Worker ServiceSimulation
Worker Service
Compiled Simulation JobsCompiled
Simulation JobsCompiled Simulation JobsCompiled
Simulation JobsCompiled Simulation JobsCompiled
Simulation JobsCompiled Simulation Jobs
Batch Scheduler(PBSPro)
StorageCluster
VCell meets OSGVCell meets OSG
Servers at CCAM
OSG Services
OSG
OSG Web
service
Outside Firewall
VCell Architecture
My ProjectMy Project
VCell meets OSG
Overall structureOverall structureA light central server that “listens”
for everything.◦Runs on vdtclient2 (outside the firewalls,
so jobs can provide feedback)◦Listens for changes in the supporting
sites◦Platform for remote and internal jobs to
communicate.◦Gives a point of
administration/monitoring for OSG part of VCell
VCell meets OSG My Project
Overall structureOverall structureServices that can be spawned by
PBS (Portable Batch System) that Vcell uses◦Used to
Search for sites Notify Listener Submit Jobs Monitor Jobs
◦Should be able to run on existing cluster A lot of extra dependencies that I’m trying to
minimize
VCell meets OSG My Project
Scavenging for sitesScavenging for sitesFew Existing tools
◦MyOSG A website for giving summary for
resources
◦VORS A website for getting information Extremely useful but getting rid of by the
end of summer
◦LDAP query to BDII server at is.grid.iu.edu Glue schema
VCell meets OSG My Project
VCell meets OSG My Project
Matching with sitesMatching with sitesTwo main ways
◦Using Condor ClassAds◦Running standard jobs and ranking
sites based on them
VCell meets OSG My Project
Condor ClassAdsCondor ClassAdsThink of Classified Adverts in
newspapers◦A service provider (Compute Element,
Service Element in this case), tells what it has
◦A client (us in this case) ask for what it wants and we try to match a suitable site.
◦Easier, but not very reliable Our requirements are fairly static A significant rework to get to work on current
systemVCell meets OSG My Project
ExamplesExamplescondor_status -const
'KeyboardIdle > 20*60 && Memory > 100' ◦Returns computers that have been
idle for more than 20 minutes and have more than 100 MB of memory
VCell meets OSG My Project
ExamplesExamples
VCell meets OSG My Project
Using HeuristicsUsing HeuristicsRunning jobs and ranking sitesSend small jobsProfile themOver time we’ll have a good
understanding of our portion of grid
Definitely harderBut, we can be smarter about
deploying jobsVCell meets OSG My Project
ExampleExample
Job Count chart for BNL_ATLAS_1
Source: GratiaVCell meets OSG My Project
Running JobsRunning JobsTwo major ways
◦Condor-G◦Globus Toolkit
VCell meets OSG My Project
Condor-GCondor-GSubmit directly to condor pool on
remote site◦Doesn’t always work◦In our case, we use PBS
Condor ClassAds take care of this but a little work upfront
◦I tried to use OSG Matchmaker and let it sort things.
VCell meets OSG My Project
Globus ToolkitGlobus ToolkitProvides a middleware for
deploying and monitoring JobsProvides Java, C and Python APIs
◦jGlobus makes sense to deploy◦Better integration with existing
codebaseWe can design complete
workflows using these tools
VCell meets OSG My Project
What I’d like to doWhat I’d like to doSelect a siteStart a jobAttach a listener that polls as a
PBS job for changesPull incremental progresses as
the job is runningKeep a transactional status of the
job on the Oracle Database
VCell meets OSG My Project
What I was almost able to What I was almost able to dodoSelect a site using Condor
Matchmaker◦It seems to select Harvard SBGrid
almost all the time◦So, I’m taking a random site for testing
Record the URL Globus Gatekeeper provides in MySQL
Poll for status and wait for DONE signal
If it is done, pull the outputVCell meets OSG My Project
ConclusionConclusionIt really is feasible to run jobs like
Virtual Cell on OSGJust not in 10 weeks from start to
finishOSG is an evolving system
◦Our decisions have to be flexibleThere are a lot of architecture
decisions we need to resolve.
VCell meets OSG My Project
FutureFutureContinuity of the projectBeing able to run from Vcell
clusterKeeping a dynamic view of the
grid as a wholeFeedback to the user
VCell meets OSG My Project
AcknowledgmentsAcknowledgmentsDr. Ion MoraruJeff DuttonJames SchaffDr. Greg HuberMats Rynge (RENCI)Arvind Gopu (Indiana University,
OSG-GOC)Peter Doherty (Harvard SBGrid)