Upload
jack-clarke
View
213
Download
0
Tags:
Embed Size (px)
Citation preview
High Performance Cyberinfrastructure Enables Data-Driven Science
in the Globally Networked World
Keynote Presentation
Sequencing Data Storage and Management Meeting at
The X-GEN Congress and Expo
San Diego, CA
March 14, 2011
Dr. Larry Smarr
Director, California Institute for Telecommunications and Information Technology
Harry E. Gruber Professor,
Dept. of Computer Science and Engineering
Jacobs School of Engineering, UCSD
Follow me on Twitter: lsmarr
1
Abstract
High performance cyberinfrastructure (10Gbps dedicated optical channels end-to-end) enables new levels of discovery for data-intensive research projects—such as next generation sequencing. In addition to international and national optical fiber infrastructure, we need local campus high performance research cyberinfrastructure (HPCI) to provide “on-ramps,” as well as scalable visualization walls and compute and storage clouds, to augment the emerging remote commercial clouds. I will review how UCSD has built out just such a HPCI and is in the process of connecting it to a variety of high throughput biomedical devices. I will show how high performance collaboration technologies allow for distributed interdisciplinary teams to analyze these large data sets in real-time.
Two Calit2 Buildings Provide Laboratories for “Living in the Future”
• “Convergence” Laboratory Facilities– Nanotech, BioMEMS, Chips, Radio, Photonics
– Virtual Reality, Digital Cinema, HDTV, Gaming
• Over 1000 Researchers in Two Buildings– Linked via Dedicated Optical Networks
UC San Diego
www.calit2.net
Over 400 Federal Grants, 200 Companies
UC Irvine
The Required Components ofHigh Performance Cyberinfrastructure
• High Performance Optical Networks• Scalable Visualization and Analysis• Multi-Site Collaborative Systems• End-to-End Wide Area CI• Data-Intensive Campus Research CI
The OptIPuter Project: Creating High Resolution Portals Over Dedicated Optical Channels to Global Science Data
Picture Source: Mark Ellisman, David Lee, Jason Leigh
Calit2 (UCSD, UCI), SDSC, and UIC Leads—Larry Smarr PIUniv. Partners: NCSA, USC, SDSU, NW, TA&M, UvA, SARA, KISTI, AISTIndustry: IBM, Sun, Telcordia, Chiaro, Calient, Glimmerglass, Lucent
Scalable Adaptive Graphics Environment (SAGE)
OptIPortal
Visual Analytics--Use of Tiled Display Wall OptIPortal to Interactively View Microbial Genome (5 Million Bases)
Acidobacteria bacterium Ellin345 Soil Bacterium 5.6 Mb; ~5000 Genes
Source: Raj Singh, UCSD
Use of Tiled Display Wall OptIPortal to Interactively View Microbial Genome
Source: Raj Singh, UCSD
Use of Tiled Display Wall OptIPortal to Interactively View Microbial Genome
Source: Raj Singh, UCSD
Large Data Challenge: Average Throughput to End User on Shared Internet is 10-100 Mbps
http://ensight.eos.nasa.gov/Missions/terra/index.shtml
Transferring 1 TB:--50 Mbps = 2 Days--10 Gbps = 15 Minutes
TestedJanuary 2011
fc *
Solution: Give Dedicated Optical Channels to Data-Intensive Users
(WDM)
Source: Steve Wallach, Chiaro Networks
“Lambdas”Parallel Lambdas are Driving Optical Networking
The Way Parallel Processors Drove 1990s Computing
10 Gbps per User ~ 100-1000x Shared Internet Throughput
Dedicated 10Gbps Lightpaths Tie Together State and Regional Fiber Infrastructure
NLR 40 x 10Gb Wavelengths
Interconnects Two Dozen
State and Regional Optical NetworksInternet2 Dynamic
Circuit Network Is Now Available
Visualization courtesy of Bob Patterson, NCSA.
www.glif.is
Created in Reykjavik, Iceland 2003
The Global Lambda Integrated Facility--Creating a Planetary-Scale High Bandwidth Collaboratory
Research Innovation Labs Linked by 10G Dedicated Lambdas
Launch of the 100 Megapixel OzIPortal Kicked Off a Rapid Build Out of Australian OptIPortals
Covise, Phil Weber, Jurgen Schulze, Calit2CGLX, Kai-Uwe Doerr , Calit2
http://www.calit2.net/newsroom/release.php?id=1421
January 15, 2008No Calit2 Person Physically Flew to Australia to Bring This Up!
January 15, 2008
“Blueprint for the Digital University”--Report of the UCSD Research Cyberinfrastructure Design Team
• Focus on Data-Intensive Cyberinfrastructure
research.ucsd.edu/documents/rcidt/RCIDTReportFinal2009.pdf
No Data Bottlenecks--Design for Gigabit/s Data Flows
April 2009
Source: Jim Dolgonas, CENIC
Campus Preparations Needed to Accept CENIC CalREN Handoff to Campus
Current UCSD Prototype Optical Core:Bridging End-Users to CENIC L1, L2, L3 Services
Source: Phil Papadopoulos, SDSC/Calit2 (Quartzite PI, OptIPuter co-PI)Quartzite Network MRI #CNS-0421555; OptIPuter #ANI-0225642
Lucent
Glimmerglass
Force10
Enpoints:
>= 60 endpoints at 10 GigE
>= 32 Packet switched
>= 32 Switched wavelengths
>= 300 Connected endpoints
Approximately 0.5 TBit/s Arrive at the “Optical” Center of Campus.Switching is a Hybrid of: Packet, Lambda, Circuit --OOO and Packet Switches
Calit2 SunlightOptical Exchange Contains Quartzite
Maxine Brown,
EVL, UICOptIPuter
Project Manager
UCSD Planned Optical NetworkedBiomedical Researchers and Instruments
Cellular & Molecular Medicine West
National Center for Microscopy & Imaging
Biomedical Research
Center for Molecular Genetics Pharmaceutical
Sciences Building
Cellular & Molecular Medicine East
CryoElectron Microscopy Facility
Radiology Imaging Lab
Bioengineering
Calit2@UCSD
San Diego Supercomputer Center
• Connects at 10 Gbps :– Microarrays
– Genome Sequencers
– Mass Spectrometry
– Light and Electron Microscopes
– Whole Body Imagers
– Computing
– Storage
UCSD Campus Investment in Fiber Enables Consolidation of Energy Efficient Computing & Storage
Source: Philip Papadopoulos, SDSC, UCSD
OptIPortalTiled Display Wall
Campus Lab Cluster
Digital Data Collections
N x 10Gb/sN x 10Gb/s
Triton – Petascale
Data Analysis
Gordon – HPD System
Cluster Condo
WAN 10Gb: WAN 10Gb: CENIC, NLR, I2CENIC, NLR, I2
Scientific Instruments
DataOasis (Central) Storage
GreenLightData Center
Community Cyberinfrastructure for Advanced Microbial Ecology Research and Analysis
http://camera.calit2.net/
Calit2 Microbial Metagenomics Cluster-Next Generation Optically Linked Science Data Server
512 Processors ~5 Teraflops
~ 200 Terabytes Storage 1GbE and
10GbESwitched/ Routed
Core
~200TB Sun
X4500 Storage
10GbE
Source: Phil Papadopoulos, SDSC, Calit2
4000 UsersFrom 90 Countries
OptIPuter Persistent Infrastructure EnablesCalit2 and U Washington CAMERA Collaboratory
Ginger Armbrust’s Diatoms:
Micrographs, Chromosomes,
Genetic Assembly
Photo Credit: Alan Decker Feb. 29, 2008
iHDTV: 1500 Mbits/sec Calit2 to UW Research Channel Over NLR
Creating CAMERA 2.0 -Advanced Cyberinfrastructure Service Oriented Architecture
Source: CAMERA CTO Mark Ellisman
The GreenLight Project: Instrumenting the Energy Cost of Computational Science• Focus on 5 Communities with At-Scale Computing Needs:
– Metagenomics– Ocean Observing– Microscopy – Bioinformatics– Digital Media
• Measure, Monitor, & Web Publish Real-Time Sensor Outputs– Via Service-oriented Architectures– Allow Researchers Anywhere To Study Computing Energy Cost– Enable Scientists To Explore Tactics For Maximizing Work/Watt
• Develop Middleware that Automates Optimal Choice of Compute/RAM Power Strategies for Desired Greenness
• Data Center for School of Medicine Illumina Next Gen Sequencer Storage and Processing
Source: Tom DeFanti, Calit2; GreenLight PI
http://tritonresource.sdsc.eduhttp://tritonresource.sdsc.edu
SDSCLarge Memory Nodes• 256/512 GB/sys• 8TB Total• 128 GB/sec• ~ 9 TF x28
SDSC Shared ResourceCluster• 24 GB/Node• 6TB Total• 256 GB/sec• ~ 20 TFx256
UCSD Research LabsSDSC Data OasisLarge Scale Storage• 2 PB• 50 GB/sec• 3000 – 6000 disks• Phase 0: 1/3 TB, 8GB/s
Moving to Shared Enterprise Data Storage & Analysis Resources: SDSC Triton Resource & Calit2 GreenLight
Campus Research Network
Calit2 GreenLight
N x 10Gb/sN x 10Gb/s
Source: Philip Papadopoulos, SDSC, UCSD
NSF Funds a Data-Intensive Track 2 Supercomputer:SDSC’s Gordon-Coming Summer 2011
• Data-Intensive Supercomputer Based on SSD Flash Memory and Virtual Shared Memory SW– Emphasizes MEM and IOPS over FLOPS– Supernode has Virtual Shared Memory:
– 2 TB RAM Aggregate– 8 TB SSD Aggregate– Total Machine = 32 Supernodes– 4 PB Disk Parallel File System >100 GB/s I/O
• System Designed to Accelerate Access to Massive Data Bases being Generated in Many Fields of Science, Engineering, Medicine, and Social Science
Source: Mike Norman, Allan Snavely SDSC
Data Mining Applicationswill Benefit from Gordon
• De Novo Genome Assembly from Sequencer Reads & Analysis of Galaxies from Cosmological Simulations & Observations • Will Benefit from
Large Shared Memory
• Federations of Databases & Interaction Network Analysis for Drug Discovery, Social Science, Biology, Epidemiology, Etc. • Will Benefit from
Low Latency I/O from Flash
Source: Mike Norman, SDSC
Rapid Evolution of 10GbE Port PricesMakes Campus-Scale 10Gbps CI Affordable
2005 2007 2009 2010
$80K/port Chiaro(60 Max)
$ 5KForce 10(40 max)
$ 500Arista48 ports
~$1000(300+ Max)
$ 400Arista48 ports
• Port Pricing is Falling • Density is Rising – Dramatically• Cost of 10GbE Approaching Cluster HPC Interconnects
Source: Philip Papadopoulos, SDSC/Calit2
10G Switched Data Analysis Resource:SDSC’s Data Oasis
212
OptIPuterOptIPuter
32
Co-LoCo-Lo
UCSD RCI
UCSD RCI
CENIC/NLR
CENIC/NLR
Trestles100 TF
8Dash
128Gordon
Oasis Procurement (RFP)
• Phase0: > 8GB/s Sustained Today • Phase I: > 50 GB/sec for Lustre (May 2011) :Phase II: >100 GB/s (Feb 2012)
40128
Source: Philip Papadopoulos, SDSC/Calit2
Triton32
Radical Change Enabled by Arista 7508 10G Switch
384 10G Capable
8Existing
Commodity Storage1/3 PB
2000 TB> 50 GB/s
10Gbps
58 2
4
Calit2 CAMERA Automatic Overflows into SDSC Triton
Triton Resource
CAMERA
DATA
@ CALIT2
@ SDSC
CAMERA -Managed
Job Submit Portal (VM)
10Gbps
Transparently Sends Jobs to Submit Portal
on Triton
Direct Mount
== No Data Staging
California and Washington Universities Are Testing a 10Gbps Connected Commercial Data Cloud
• Amazon Experiment for Big Data– Only Available Through CENIC & Pacific NW
GigaPOP– Private 10Gbps Peering Paths
– Includes Amazon EC2 Computing & S3 Storage Services
• Early Experiments Underway– Robert Grossman, Open Cloud Consortium– Phil Papadopoulos, Calit2/SDSC Rocks
Academic Research OptIPlanet Collaboratory:A 10Gbps “End-to-End” Lightpath Cloud
National LambdaRail
CampusOptical Switch
Data Repositories & Clusters
HPC
HD/4k Video Repositories
End User OptIPortal
10G Lightpaths
HD/4k Live Video
Local or Remote Instruments
You Can Download This Presentation at lsmarr.calit2.net