View
214
Download
0
Category
Tags:
Preview:
Citation preview
Thursday, August 21, 2008
Cyberinfrastructure for Cyberinfrastructure for Research TeamsResearch Teams
UAB High Performance Computing Services
John-Paul Robinson <jpr@uab.edu>
UAB Cyberinfrastructure (CI)Investments
Common Network User Identity (BlazerID) for consistent identity across systems
Early Internet2 Member providing high bandwidth network access to other research campuses
High Performance Computing (HPC) Investments to build investigative capacity for computational research
On-going Model of Engagement to support Research Technology Investments
Alabama State Optical Network and National LamdaRail
Alabama SON is a very high bandwidth lambda network. Operated by SLR.
Connects major research institutions across state
Connects Alabama to National Lambda Rail and Internet2
10GigE Campus Research Network
Connects campus HPC centers to facilitate resource aggregation
Compute clusters scheduled for connectivity
Facilitates secure network build outs
Expands access to regional and national compute resources
Cyberinfrastructure Elements
A Continuum of Identity lower assurance – facilitates collaboration higher assurance – facilitates trust
Maximized Network Bandwidth Pools of Execution Resources A Common Data Framework Reliability and Performance Monitoring
Harnessing CI with the Grid
Interconnects and coordinates resources across administrative domains
Uses standard, open, and general purpose interfaces and protocols
Allows resource combination to deliver high quality services built on the core utility
The “grid” is the Fabric of Interconnected Resources
About UABgrid
Leverages Local, Regional and National Cyberinfrastructure Components
Identity, Execution, Data, Status, and Networking
Integrated Technology Infrastructure to Facilitate and Encourage Collaboration
Remember: It's All About the Data Sharing Information is the Motivation for Collaboration
UABgrid Overview
UABgrid Pilot launched at campus HPC Boot-Camp September 2007
User-driven collaboration environment supports web and grid applications
Leverages InCommon for user identification SSO for web applications and VO Management Self-service certificate generation for Globus users
Provides meta-cluster to harness on- and off-campus compute power using GridWay
Cyberinfrastructure
IdM Exec Data NetInfo
UABgrid
Application1
Application4Application
3
Application2
Building Standard Service Interfaces
Infrastructure to Support Application Domains
Cyberinfrastructure
IdM Exec Data NetInfo
UABgrid
Application1
Research UserAdminEducation
UABgrid Provides Services to Research Applications
Cyberinfrastructure
IdM Exec Data NetInfo
UABgrid
Research Applications
Users StatsFilesProcessesGroups Comm
UABgrid Applications and Services
Collaboration Support VO Tools: VO Mgmt, Mail lists, Wiki's, Project Mgmt, Portals...
Research Applications Support Compute Expansion
Goals Generic Model Current Focus is Workflow Migration
Science Domains Microbiology -- DynamicBLAST Statistical Genetics – R Statistical Package Cancer Center – caBIG
UABgrid VO Management: User Attributes to Apps
IdP1IdP1
IdP1IdP2
IdP1IdPn
App1
Appn
App2
Identity Providers Applications
UserAttributes
VOAttributes
myVocs System
Collaboration Support
myVocs box forms the core of VO collaboration infrastructure
VO resources like mailing list, wiki's, and Trac intrinsic to VO and can access common authorization information
Additional web collaboration tools instantiated as needed (eg. Gridsphere)
VO resources hosted in VM cloud dev.uabgrid is a working VO model for the
construction and management of UABgrid
Compute Expansion
Meta-scheduling: Grid Cluster Cluster Upgrades and Acquisitions Resource Aggregation
State Resources Regional Resources via SURAgrid National and International Resources via TeraGrid &
Open Science Grid
UABgrid Pilot Meta-Cluster Specifications
Today 2 campus clusters + ASA resource: 912 processing cores,
>5TFlops of power
2009 Targets Add all shared campus clusters: 1156 more processing cores
and 10TFlops of additional power
On Going Local expansion though campus HPC investments Engage SURAgrid, OSG, TeraGrid, and other grid compute
suppliers for more compute power
Generic Grid Application Model
Command Line Custom ClientWeb Portal
Appllication Workflow Logic
Metascheduling: GridWay, DRMAA, Swift, Pegasus, Avalon
Globus Client Tools
Globus Services Globus ServicesGlobus Services
SGE
AppCode Data
LSF
AppCode Data
PBS
AppCode Data
Cluster 1 Cluster nCluster 2
Grid Migration Goals
Eliminate need for user-level grid technology awareness
Build on grid middleware, tools, and standards to maximize portability and resource utilization
Manage and leverage variable resource availability and dynamic load balancing
Efficiently and transparently handle issues like application availability, fault tolerance, and interoperability
Application Containers Simplify Administration
Types of Containers User Accounts Java Boxes Virtual Machines
Account Containers Initial target because most common and addresses R application
configuration Allows for library dependency and site dependency configuration Full continuum of deployment options from fully staged for each
job to statically cached on resources
Migrating Workflows to Grid
Statistical Genetics R Statistical Package Methodological Analysis Workflow Many Isolated Computations Work in Progress and Promising Results Developing Work led by John-Paul Robinson in UAB HPC Services
Microbiology DynamicBLAST – Grid Version of BLAST Master Worker Type Application Maximize Throughput, Minimize Job Turn-around Leading Model for Migrations Work led by Enis Afgan and Dr. Puri Bangalore in CIS
MIG Workflow Powered by the Grid
10 100 200 250 500 1000
0
50
100
150
200
250
300
350
400
450
500
MIG Workflow Performance
10,000 Iterations
Job Granularity (Chunks)
Min
utes
per
Chu
nk
Manual job control constrains performance to the human scale (~10)
Automating job control enables managing scale that significantly improves job performance and resource utilization
Dynamic BLAST Grid Workflow
BLAST is a gene search algorithm
Dynamic BLAST breaks application steps and search apart and spreads effort across the grid
Good example of component and data parallelization
SCOOP – Coastal Ocean Observation and Prediction
SURA program to advance the sciences of prediction and hazard planning for coastal populations
Harvests cycles around the grid
Working with MCNC/Renci to use Cheaha via SURAgrid
Research Initiative Support
caBIG UAB Comprehensive Cancer Center funded to connect to caBIG Contributed to completion of Self-Assessment and
Implementation Plan Deploying Life Sciences Distribution to support research
workflows caBIG provides a very good model for service and infrastructure
abstractions
caGrid Bring BlazerID system to NIST Level 2 Exploring Integration of caGrid GAARDS AuthX Infrastructure
(GridGrouper)
Education and Training
UAB 2007 HPC Boot Camp included sessions on grid computing and UABgrid Pilot launch
2008 HPC Bootcamp September 22, 2008
UAB 1st Annual CIDay in conjunction with ASA campus visit
CIS has taught graduate-level grid computing courses since fall 2003
Active participation in grid technology communities MardiGras08, OGF22, SURAgrid All-Hands, Internet2, caBIG
Open Development Model
UABgrid development work is done openly Outside groups are actively engaged in the
development of infrastructure (CIS, ENG, ASA, etc) Development group relies on the same services
available to all users (we eat our own dog food) Virtual organizations build on infrastructure and are
free to engage to their level of interest
Collaborative Development
Engaging User Groups and Service Providers to leverage Infrastructure
We are building our own solutions to depend on the grid
In order to build a grid, you need carrots – there has to be a benefit, even if it's long term
Grid services and development environment built on virtual machine foundation – key to expectation of “running from the cloud”
Engagement in a Regional Infrastructure Construction
Involved in SURAgrid since it's inception as a voluntary extension the the NSF's Middleware Initiative Tesbed
Have helped mold an organization that provides broad engagement across organizations in the development of infrastructure
SURAgrid Governance Committee just completed strategic plan to guide the next 4 years
Technology in Service of Research
IT expresses institutional initiatives IT doesn't necessarily do it but should help make it
possible To have leading research you need leading
infrastructure IT supports a leading edge infrastructure and services framework IT provides transparent interfaces to services and operations
Implement grid interfaces and conventions for our own services – “eat our own dog food”
Trust is the Foundation for Collaboration
People Use Technology They Trust Open Communication Channels
Researchers and Infrastructure communicate as peers Intra-organizational communication is fluid
Control Over Implementation Application requirements lead acquisitions
Service Partnership Researchers and Infrastructure work together to satisfy
organizational commitments
Important Issues are Guaranteed Service Researchers have authorized influence over Infrastructure because
are part of same organization
On The Horizon
Data Services UABgrid Backup
Implement using technologies that satisfy needs of the user community (eg. GridFTP, REDDnet
Focus on backup of VMs: putting our valuable data on-line...just like users would be expected to do
Data Stores Dspace, Fedora, Alfresco, Subversion
Metrics increase reliability confidence and maintain a pulse on the
impact of our solutions Resource Integration Guidlines High Speed to the Desktop
Acknowledgments
UAB Office of the Vice President for Information Technology
Collaborators at UAB in Computer and Information Sciences, the School of Engineering, the School of Public Health Section on Statistical Genetics, Comprehensive Cancer Center
Collaborators within SURAgrid, Internet2, and other organizations
Recommended