Upload
joleen-joseph
View
215
Download
2
Tags:
Embed Size (px)
Citation preview
Building Grids: If Everybody Else Is Doing It, Why Shouldn’t You?
Jay Boisseau, Texas Advanced Computing Center
SURA Grid ApplicationPlanning & Implementation Workshop
December 6-8, 2005
Outline
• Welcome!• Overview of TACC (with Grid Computing
Context)• Some Perspectives on Grid Computing• Closing Thoughts• More
Overview of TACC(with Grid Computing Context)
TACC Mission
To enhance knowledge discovery & education and to
improve society through the application of advanced
computing technologies.
To accomplish this mission, TACC: – Evaluates, acquires & operates advanced computing systems
and software– Provides documentation, consulting, and training to users of
advanced computing resources
– Conducts R&D to produce new computational technologies & techniques that enhance advanced computing systems
– Collaborates with users to apply advanced computingtechniques in their research, develop, occupations, etc.
– Educates the community to broaden and deepen the pipelineof talented persons choosing careers in advanced computing
– Informs society about the value of advanced computingtechnologies in improving knowledge and quality of life
TACC Strategic Approach
Resources& Services
PR & EOT
Research & Development
TACC Advanced ComputingTechnology Areas
• High Performance Computing (HPC)
• Visualization & Data Analysis (VDA)
• Data & Information Systems (DIS)
• Distributed & Grid Computing (DGC)
TACC Advanced ComputingTechnology Areas
• High Performance Computing (HPC)
• Visualization & Data Analysis (VDA)
• Data & Information Systems (DIS)
• Distributed & Grid Computing (DGC)– newest area of R&D, resources, services at TACC– “tying it all together”
TACC Advanced ComputingApplications Focus Areas
• Computational Geosciences– World-class expertise, programs at UT Austin– Strategic to state of Texas
• Computational Life Sciences– Broad & deep expertise in Texas higher ed institutions– Important to society
• Emergency Situation Assessment & Response– Crucial to life, property– Leverages TACC expertise, resources, and applications
TACC Advanced ComputingApplications Focus Areas
• Computational Geosciences– World-class expertise, programs at UT Austin– Strategic to state of Texas
• Computational Life Sciences– Broad & deep expertise in Texas higher ed institutions– Important to society
• Emergency Situation Assessment & Response– Crucial to life, property– Leverages TACC expertise, resources, and applications
• Each has need for resources sharing & coordination, workflow, data/instrument integration: grid computing
TACC HPC & Storage Systems
STK PowderHorns (2)2.8 PB max capacity
managed by Cray DMF
IBM Power4 System224 CPUs (1.16 Tflops)
½ TB memory, 7.1 TB disk
Dell Xeon EM64T Linux Cluster656 CPUs (4.2 Tflops)
1.3 TB memory, ~4 TB disk
LONGHORNWRANGLER
ARCHIVE
Cray-Dell Xeon Linux Cluster1028 CPUs (6.3 Tflops)
1+ TB memory, 40+ TB disk
LONESTAR
Sun SANs andData Direct Disk
> 50TB
GLOBAL DISKSTAMPEDE
Mac Xserve G5 Cluster46 CPUs (368 Gflops)
52GB memory, 3.7TB disk
ACES VisLab
• Front and Rear Projection Systems– 3x1 semi-cylinder immersive environment, 24’ diameter– 5x2 large-screen, 16:9 panel tiled display– Matrix switch between systems, projectors, rooms
• Full immersive capabilities with head/motion tracking
TACC Advanced Visualization Systems
• Sun Terascale Visualization System– 128 UltraSparc 4 cores, ½ TB memory– 16 commodity graphics cards, > 3
Gpoly/sec– Remote to VisLab; very remote to
TeraGrid!
• SGI Onyx2– 24 CPUs, 6 Infinite Reality 2 Graphics
Pipes– 25 GB Memory, 356 GB Disk
TACC Network Connectivity
• Intercampus bandwidth– Force10 switch/routers with 1.2 Tbps backplane in TACC
machine room and ACES building– 10 Gbps between TACC machine room and ACES provided
by Nortel DWDM (waiting for 10GigE cards)• WAN network upgrades:
– UT Internet2 at OC-12– TeraGrid connection at 10 Gbps– New Lonestar Education And Research Network (LEARN)
being built for Texas universities– Texas Joining National Lambda Rail (10 Gbps waves)
• High bandwidth networks (local and national) to facilitate resource sharing, coordination, data flow…
TACC R&D – Distributed & Grid Computing
• Web-based grid portals– GridPort, TeraGrid User Portal, SURA portal, TIGRE portal
• Grid resource data collection & information services– GPIR
• Overall grid deployment and integration– UT Grid, TeraGrid, TIGRE, OSG, SURA
• Grid scheduling and workflow tools– GridShell, MyCluster, Metascheduling Prediction Services
• Remote and collaborative grid-enabled visualization– For TeraGrid, UT Grid
• Network performance for moving terascale data
TACC Activities & Scope
Research
Development
Services
Resources
EOT
HPC Vis Data Grid
Since 1986
TACC Activities & Scope
Research
Development
Services
Resources
EOT
HPC Vis Data Grid
Since 1986
Since 2001
TACC Activities & Scope
Research
Development
Services
Resources
EOT
HPC Vis Data Grid
Since 1986
UT Grid,TIGRE,
TeraGrid,OSG,
SURAgrid,GridPort,GridShell,
etc.
Since 2001
TACC Today
TACC Tomorrow
Summary
• TACC has grown into a leading center since June 01– 4x of staff, 6x external funding– 100x compute power– New R&D in HPC, Vis, Data, and especially Grid Computing– New EOT, international, industrial partners programs
Summary
• TACC has grown into a leading center since June 01– 4x of staff, 6x external funding– 100x compute power– New R&D in HPC, Vis, Data, and especially Grid Computing– New EOT, international, industrial partners programs
• Grid computing projects have played a major role in TACC’s growth and success so far– Leadership in software including GridPort, GridShell,
MyCluster, Metascheduling Prediction Services– Partnership in grids at campus, state, regional, national, and
international scales
Some Perspectives onGrid Computing
Researchers Already Use Distributed Computing: Case Is Already Made!
• Researchers already use distributed systems:– Local workstations for some development, small simulations– HPC at big centers– Visualization back in their lab or in a Vislab– Archival storage to SANs, NASes, tape silos, etc.
• Researchers already collaborate with peers at other institutions– science is collaborative!
• Grids should enable resource sharing, collaboration, etc. with– Greater ease– More flexibility– More capability
Or in English…
“There are talented people everywhere in the world focused on solving the most challenging problems, and there are companies everywhere determined to provide the best products as efficiently as possible… people WILL collaborate and learn to share resources, as well as ideas and data, in order to ‘be first’ … people have been using distributed resources for decades, and this is only increasing… Grid computing to me is the subset of distributed computing that makes it easier… So, ‘Grid computing’ is here today and will remain important, by whatever name you want to call it.” -- me in GRIDtoday 12/05/05
Grid Computing: My View
• Grid computing is a standard, ‘complete’ set of distributed computing software capabilities
• Grid computing must provide some basic functions– resource discovery and information collection & publishing– data management on and between resources– process management on an between resources– common security mechanism underlying the above
• No grid computing package provides everything• Example: ‘Open Grid Services Architecture’ (OGSA)
(e.g., as implemented in Globus v4) makes it possible to build the components and make them work together
Grid Computing: My View
• TACC focuses on Grid computing to– enhance our HPC, SciVis, and massive data
storage– integrate researchers’ local computing systems
with ours– eventually, integrate research instruments for
research that also requires HPC, SciVis, massive data storage
So TACC Drank The Grid Kool-Aid
• What grids are we participating in?– UT Grid: campus-scale– TIGRE: state– SURA Grid: regional– TeraGrid: national– Open Science Grid: international– And we’re building grid tools to provide capabilities
for/in these grids
• Why are we participating in these grids? Some examples will answer that question….
UT Grid: Enable Campus-Wide Terascale Distributed Computing
• Why Build It? To move from ‘island’ of high-end resources to ‘hub’ of campus computing continuum– provide models for local resources (clusters, vislabs, etc.),
training, and documentation– develop procedures for integrating local systems to UT Grid
• single sign-on, data space, compute space
• leverage every PC, cluster, NAS, etc. on campus!
– integrate digital assets into UT Grid– integrate UT instruments & sensors into UT Grid– provide user portals and login nodes to access and use all
campus resources!
UT Grid: Resources Distributed Across Two Campuses
Research campus
Main campus
TACC Vis
NOC
Ext nets
GAATN
ACES
SwitchICES Cluster
ICES Data
ICES Cluster
PGE Cluster
PGE Data
PGE Cluster
Switch
PGE
NOC
TACCPWR4
CMS
TACCStorage
Switch
TACCCluster
UT Grid Status
• First 20 Months:– Deployed production United Devices ‘grid,’
(Roundup)– Deployed production Condor pool, integrated with
other pools (Rodeo)– Developed GridPort v4, GridShell v1– Building user portal, downloadable client software
stack– More to come… (see tomorrow’s talk)
TIGRE: Texas Internet Grid forResearch & Education
• Why Build It?: Help Texas universities &medical centers work together to shareresources and advance Texas research,education, economy
• 2 year project, $2.5M– But took 2+ years to get funding!
• 5 funded participants– Rice University– Texas Tech University– Texas A&M– University of Houston– University of Texas
TIGRE: Texas Internet Grid forResearch & Education
• Develop, document, and deploy a grid across the 5 participants– Supporting driving applications
• Enable other LEARN members to join TIGRE– Package grid software so that others can easily install it– Provide good documentation– Ensure that it’s easy, lightweight– Make it modular: enable institutions to provide just what they can
offer
• NOTE: Companion project (LEARN) will provide a high-bandwidth network for use by TIGRE and other Texas institutions
TIGRE Deliverables: Quick Build!
YEAR 1• Q1
– Project plan– Web site– Certificate Authority– Minimum testbed requirements– Select 3 driving applications
• Q2– Alpha portal
• Q3– Define software stack– Distribution Mechanism– Simple demo of 1 TIGRE app
• Q4– Alpha client software package
distributed
YEAR 2• Q1
– Alpha customer management services system deployed & demonstrated
• Q2– Global grid scheduler deployed
• Q3– Stable software available (only bug
fixes after this)– Services required to be part of
TIGRE specified• Q4
– Complete hardening of software– Complete documentation– Finalized procedures and policies to
join TIGRE & document– Demonstrate TIGRE at SC
NSF TeraGrid: National Cyberinfrastructure for Computational
Science• Why Build It? Provide
terascale computational capabilities that go beyond just HPC to facilitate 21st century research!
• Includes NCSA, SDSC, PSC, Indiana, Purdue, Argonne, and Oak Ridge
• Anointed as NSF production cyberinfrastructure for 5 years
- TACC is providing terascale computing, storage, - TACC is providing terascale computing, storage, and visualization resources and visualization resources- UT is providing terascale geosciences data sets- UT is providing terascale geosciences data sets
Closing Thoughts
So Should You Or Shouldn’t You?
• Grid computing is here to stay, by one name or another…– The possibilities are too great– The needs are too great
• But it’s not always needed– Simple solutions, powerful tools, sharp minds get answers– Can maximize collaboration, but can also inhibit people from
working on the real problem
• Get user requirements and THINK!– What is needed?– What is overkill?– Use mature technologies unless doing grid R&D– Use the minimum subset to meet requirements, build on successes
incrementally
To Build Useful Grids, Software Must Be:
• Easier– No more difficult than CLIs for ‘power users’– No more difficult than the Web/PC apps for the
other 99% of (potential) users (portals, desktop apps, etc.)
– No more difficult than configuring office network for admins
• Smarter– Smart scheduling, data transfers, workflow– Built-in help/advice, like PC apps and portals
To Build Useful Grids, Software Must Be:
• More robust– Must not break more than the individual resources– Opportunity is to break less than any individual
resource (but only partially successful so far)
• And standards-based & interoperable– Web services, etc.
• So lots of opportunities for us geeks!– But let’s not lose sight of the forest for the trees!
Finally, Enjoy Your Time HereWhile You Learn
• Austin is Fun, Cool, Weird, & Wonderful– Mix of hippies, slackers, academics, geeks, politicos,
musicians, filmmakers, artists, and even a few cowboys– “Keep Austin Weird” is the official slogan– Live Music Capital of the World (seriously)
• Also great restaurants, cafes, clubs, bars, theaters, galleries, museums, etc.– http://www.austinchronicle.com/– http://www.austin360.com/xl/content/xl/index.html– http://www.research.ibm.com/arl/austin/index.html (!)
Your Austin To-Do List
Eat barbecue at Rudy’s, Stubb’s, Iron Works, Green Mesquite, etc. Eat Tex-Mex at Chuy’s, Trudy’s, Maudie’s, etc. Have a cold Shiner Bock (but not Lone Star) Visit 6th Street and Warehouse District at night Go to at least one live music show Learn to two-step at The Broken Spoke Visit the Texas State History Museum Walk/jog/bike around Town Lake Visit the UT main campus and the ACES VisLab See a movie at Alamo Drafthouse Cinema (arrive early, order beer & food) Eat Amy’s Ice Cream Listen to and buy local music at Waterloo Records Buy a bottle each of Rudy’s Barbecue ‘Sause’ and Tito’s Vodka Drive into the Hill Country, visit small towns and wineries See sketch comedy at Esther’s Follies See a million bats emerge from Congress Ave. bridge at sunset
Welcome to TACCand Austin, Y’all!