45
“A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research” Keynote Presentation CENIC 2013 Held at Calit2@UCSD March 11, 2013 Dr. Larry Smarr Director, California Institute for Telecommunications and Information Technology Harry E. Gruber Professor, Dept. of Computer Science and Engineering Jacobs School of Engineering, UCSD http://lsmarr.calit2.net 1

A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research

Embed Size (px)

DESCRIPTION

13.03.11 Keynote Presentation CENIC 2013 Held at Calit2@UCSD

Citation preview

Page 1: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research

“A Campus-Scale High Performance Cyberinfrastructure is Required

for Data-Intensive Research”

Keynote Presentation

CENIC 2013

Held at Calit2@UCSD

March 11, 2013

Dr. Larry Smarr

Director, California Institute for Telecommunications and Information Technology

Harry E. Gruber Professor,

Dept. of Computer Science and Engineering

Jacobs School of Engineering, UCSD

http://lsmarr.calit2.net

1

Page 2: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research

“Blueprint for the Digital University”--Report of the UCSD Research Cyberinfrastructure Design Team

• A Five Year Process Begins Pilot Deployment This Year

research.ucsd.edu/documents/rcidt/RCIDTReportFinal2009.pdf

No Data Bottlenecks--Design for

Gigabit/s Data Flows

April 2009

See talk on RCI by Richard MooreToday at 4pm

Page 3: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research

Calit2 Sunlight OptIPuter Exchange Connects 60 Campus Sites Each Dedicated at 10Gbps

Maxine Brown,

EVL, UICOptIPuter

Project Manager

Page 4: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research

Rapid Evolution of 10GbE Port PricesMakes Campus-Scale 10Gbps CI Affordable

2005 2007 2009 2010 2011 2013

$80K/port Chiaro(60 Max)

$ 5KForce 10(40 max)

$ 500Arista48 ports

$ 400 (48 ports – today); 576 ports (2013)

• Port Pricing is Falling • Density is Rising – Dramatically• Cost of 10GbE Approaching Cluster HPC Interconnects

Source: Philip Papadopoulos, SDSC/Calit2

Page 5: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research

Arista Enables SDSC’s Massively Parallel 10G Switched Data Analysis Resource

12

Page 6: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research

Partnering Opportunities with NSF:SDSC’s Gordon-Dedicated Dec. 5, 2011

• Data-Intensive Supercomputer Based on SSD Flash Memory and Virtual Shared Memory SW– Emphasizes MEM and IOPS over FLOPS– Supernode has Virtual Shared Memory:

– 2 TB RAM Aggregate– 8 TB SSD Aggregate

– Total Machine = 32 Supernodes– 4 PB Disk Parallel File System >100 GB/s I/O

• System Designed to Accelerate Access to Massive Datasets being Generated in Many Fields of Science, Engineering, Medicine, and Social Science

Source: Mike Norman, Allan Snavely SDSC

Page 7: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research

Gordon Bests Previous Mega I/O per Second by 25x

Page 8: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research

Creating a “Big Data Freeway” SystemConnecting Instruments, Computers, & Storage

Phil Papadopoulos, PILarry Smarr co-PI

PRISM@UCSD

Start Date1/1/13

See talk on PRISM

by Phil P.Tomorrow at

9am

Page 9: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research

Many Disciplines Beginning to NeedDedicated High Bandwidth on Campus

• Remote Analysis of Large Data Sets– Particle Physics

• Connection to Remote Campus Compute & Storage Clusters– Ocean Observatory

– Microscopy and Next Gen Sequencers

• Providing Remote Access to Campus Data Repositories– Protein Data Bank and Mass Spectrometry

• Enabling Remote Collaborations– National and International

How to Terminate a CENIC 100G Campus Connection

Page 10: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research

PRISM@UCSD Enables Remote Analysis of Large Data Sets

Page 11: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research

CERN’s CMS Detector is One of the World’s Most Complex Scientific Instrument

See talk on LHC 100G Networks by Azher Mughal, CaltechToday at 10am

Page 12: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research

CERN’s CMS ExperimentGenerates Massive Amounts of Data

Page 13: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research

UCSD is a Tier-2 LHC Data Center

Source: Frank Wuerthwein, Physics UCSD

Page 14: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research

Flow Out of CERN for CMS DetectorPeaks at 32 Gbps!

14Source: Frank Wuerthwein, Physics UCSD

Page 15: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research

CMS Flow Into Fermi LabPeaks at 10Gbps

15Source: Frank Wuerthwein, Physics UCSD

Page 16: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research

CMS Flow into UCSD PhysicsPeaks at 2.4 Gbps

16Source: Frank Wuerthwein, Physics UCSD

Page 17: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research

Open for all of science, includingbiology, chemistry, computer science, engineering, mathematics, medicine, and physics

The Open Science GridA Consortium of Universities and National Labs

Source: Frank Wuerthwein, Physics UCSD

Page 18: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research

Dan Cayan USGS Water Resources Discipline

Scripps Institution of Oceanography, UC San Diego

much support from Mary Tyree, Mike Dettinger, Guido Franco and other colleagues

Sponsors: California Energy Commission NOAA RISA program California DWR, DOE, NSF

Planning for climate change in California substantial shifts on top of already high climate variability

Page 19: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research

Greenhouse Gas

Emissionsand

ConcentrationCMIP3 GCM’s

UCSD Campus Climate Researchers Need to Download Results from Remote Supercomputer Simulations

Source: Dan Cayan, SIO UCSD

Page 20: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research

GCMs ~150km downscaled toRegional models ~ 12km

Many simulationsIPCC AR4 and IPCC AR5 have been downscaledusing statistical methods

INCREASING VOLUME OF CLIMATE SIMULATIONS

in comparison to 4th IPCC (CMIP3) GCMs :

Latest Generation CMIP5 Models Provide: More Simulations Higher Spatial Resolution More Developed Process Representation Daily Output is More Available

Global to Regional Downscaling

Source: Dan Cayan, SIO UCSD

Page 21: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research

average summer afternoon temperature

average summer afternoon temperature

21GFDL A2 1km downscaled to 1kmHugo Hidalgo Tapash Das Mike Dettinger

Page 22: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research

HOW MUCH CALIFORNIA SNOW LOSS ? Initial projections indicate substantial reduction

in snow water for Sierra Nevada+

declining Apr 1 SWE:2050 median SWE ~ 2/3 historical median2100 median SWE ~ 1/3 historical median

Page 23: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research

PRISM@UCSD Enables Connection to Remote Campus Compute & Storage Clusters

Page 24: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research

The OOI CI is Built on Dedicated 10GEand Serves Researchers, Education, and Public

Source: Matthew Arrott, John Orcutt OOI CI

Page 25: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research

Reused Undersea Optical CablesForm a Part of the Ocean Observatories

Source: John Delaney UWash OOI

Page 26: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research

Source: John Orcutt, Matthew Arrott, SIO/Calit2

OOI CI is Built on Dedicated Optical Networks and Federal Agency & Commercial Clouds

Page 27: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research

OOI CI Team at Scripps Institution of Oceanography Needs Connection to Its Server Complex in Calit2

Page 28: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research

Ultra High Resolution Microscopy ImagesCreated at the National Center for Microscopy Imaging

Page 29: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research

Zeiss Merlin 3View w/ 32k x 32k Scanning and Automated Mosaicing:

Current= 1-2 TB/week soon 12 TB/week

JEOL-4000EX w/ 8k x 8k CD, Automated Mosaicing, and Serial Tomography:

Current= 1 TB/week

FEI Titan w/ 4k x 4k STEM, EELS, 4k x 3.5k DDD, 4k x4k CCD, Automated Mosaicing, and Multi-tilt Tomography:

Current= 1 TB/week

200-500 TB/year Raw >2 PB/year Aggregate

Microscopes Are Big Data Generators – Driving Software & Cyberinfrastructure Development

Source: Mark Ellisman, School of Medicine, UCSD

Page 30: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research

NIH National Center for Microscopy & Imaging Research Integrated Infrastructure of Shared Resources

Source: Steve Peltier, Mark Ellisman, NCMIR

Local SOM Infrastructure

Scientific Instruments

End UserWorkstations

Shared Infrastructure

Page 31: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research

Agile System that Spans Resource Classes

Page 32: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research

SDSC Gordon Supercomputer Analysisof LS Gut Microbiome Displayed on Calit2 VROOM

Calit2 VROOM-FuturePatient Expedition

See Live Demo on Calit2 to CICESE 10G

Weds at 8:30am

Page 33: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research

PRISM@UCSD Enables Providing Remote Access to Campus Data Repositories

Page 34: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research

Protein Data Bank (PDB) NeedsBandwidth to Connect Resources and Users

• Archive of experimentally determined 3D structures of proteins, nucleic acids, complex assemblies

• One of the largest scientific resources in life sciences

Source: Phil Bourne and Andreas Prlić, PDBHemoglobin

Virus

Page 35: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research

PDB Usage Is Growing Over Time

• More than 300,000 Unique Visitors per Month• Up to 300 Concurrent Users• ~10 Structures are Downloaded per Second 7/24/365• Increasingly Popular Web Services Traffic

Source: Phil Bourne and Andreas Prlić, PDB

Page 36: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research

RCSB PDB159 millionentry downloads

PDBe34 millionentry downloads

PDBj16 millionentry downloads

2010 FTP Traffic

36

Source: Phil Bourne and Andreas Prlić, PDB

Page 37: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research

• Why is it Important?– Enables PDB to Better Serve Its Users by Providing

Increased Reliability and Quicker Results

• How Will it be Done?– By More Evenly Allocating PDB Resources at Rutgers and

UCSD– By Directing Users to the Closest Site

• Need High Bandwidth Between Rutgers & UCSD Facilities

  

PDB Plans to Establish Global Load Balancing

Source: Phil Bourne and Andreas Prlić, PDB

Page 38: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research

UCSD Center for Computational Mass SpectrometryBecoming Global MS Repository

ProteoSAFe: Compute-intensive discovery MS at the click of a button

MassIVE: repository and identification platform for all

MS data in the world

Source: Nuno Bandeira,Vineet Bafna, Pavel Pevzner,

Ingolf Krueger, UCSD

proteomics.ucsd.edu

Page 39: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research

Automation: Do it Billions of Times

• Large Volumes of Data from Many Sources--Must Automate– Thousands of Users, Tens of Thousands of Searches

– Multi-Omics: Proteomics, Metabolomics, Proteogenomics, Natural Products, Glycomics, etc.

• CCMS ProteoSAFe– Scalable: Distributed Computation over 1000s of CPUs

– Accessible: Intuitive Web-Based User Interfaces

– Flexible: Easy Integration of New Analysis Workflows

• Already Analyzed >1B Spectra in >26,000 Searches from >2,200 users

Page 40: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research

PRISM@UCSD Enables Enabling Remote National and International Collaborations

Page 41: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research

Tele-Collaboration for Audio Post-ProductionRealtime Picture & Sound Editing Synchronized Over IP

Skywalker Sound@Marin Calit2@San Diego

Page 42: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research

Tele-Collaboration for Cinema Post-Production

Disney + Skywalker Sound + Digital Domain + Laser Pacific NTT Labs + UCSD/Calit2 + UIC/EVL + Pacific Interface

Page 43: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research

Collaboration Between EVL’s CAVE2 and Calit2’s VROOM Over 10Gb Wavelength

EVL

Calit2

Source: NTT Sponsored ON*VECTOR Workshop at Calit2 March 6, 2013

Page 44: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research

Calit2 is Linked to CICESE at 10GCoupling OptIPortals at Each Site

See Live Demo on Calit2 to CICESE 10G

Weds at 8:30am

Page 45: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research

PRAGMAA Practical Collaboration Framework

Build and Sustain Collaborations

Advance & Improve Cyberinfrastructure

Through Applications Source: Peter Arzberger, Calit2 UCSD