42
The Initiative in Innovative Computing at Harvard Alyssa A. Goodman IIC Director & Prof. of Astronomy

PPT

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: PPT

The Initiative in Innovative Computing at Harvard

Alyssa A. GoodmanIIC Director & Prof. of Astronomy

Page 2: PPT

Agenda

What is IIC? (“Filling the Gap”)

Where did it come from? (A Story)

What have we done so far? (Startup Mode)

What are we about to do? (Projects, Hiring Plans)

What do we hope to do? (Long-term Goals)

Page 3: PPT

Filling the “Gap” between Science and Computer

Science

Increasingly, core problems in science require computational solution

Typically hire/“home grow” computationalists, but often lack the expertise or funding to go beyond the immediate pressing need

Focused on finding elegant solutions to basic computer

science challenges

Often see specific, “applied” problems as outside their

interests

Scientific disciplines

Computer Science departments

Page 4: PPT

Where did IIC come from?

Short Version: Response to Harvard’s “expansion” in Science, and into Allston. See IIC Whitepaper (2004) & Task Force on Science & Technology report (2005) for more.

Long Version…

Page 5: PPT

Computational challenges are common across scientific disciplines

How to:

Acquire, transmit, organize, and query new kinds of data?

Apply distributed computing resources to solve complex

problems?

Derive meaningful insight from large datasets?

Share, integrate and analyze knowledge across

geographically dispersed researchers?

Visually represent scientific results so as to maximize

understanding? Opportunity to collaborate and apply insights from one field to another

Page 6: PPT

Workflow and WORKFLOWExamples Astronomy Public Health

““Collect”Collect” TelescopeTelescope Microscope, Microscope,

Stethoscope, SurveyStethoscope, Survey

COLLECTCOLLECT ““National Virtual National Virtual Observatory”/Observatory”/

COMPLETECOMPLETE

CDC WonderCDC Wonder

““Analyze”Analyze” Study the density Study the density structure of a star-structure of a star-

forming glob of forming glob of gasgas

Find a link between Find a link between one factory’s one factory’s

chlorine runoff & chlorine runoff & diseasedisease

ANALYZEANALYZE Study the density Study the density structure of structure of allall

star-forming gas star-forming gas in…in…

Study the toxic Study the toxic effects of chlorine effects of chlorine runoff runoff in the U.Sin the U.S..

““Collaborate”Collaborate” Work with your student Work with your student

COLLABORATECOLLABORATE Work with 20 people in 5 countries, in Work with 20 people in 5 countries, in real-timereal-time

““Respond”Respond” Write a paper for a Journal.Write a paper for a Journal.

RESPONDRESPOND Write a paper, the quantitative results of Write a paper, the quantitative results of which are shared globally, digitally.which are shared globally, digitally.

Page 7: PPT

Real World Workflow

e.g. Emergency Medicine in the Age of High-Speed Networks, Fast Processors, Mass Storage, and Miniature

Devices

IIC/Harvard contact: Matt Welsh, DEAS

Page 8: PPT

Continuum

“Pure” Discipline Science

(e.g. Galileo)

“Pure” Computer Science

(e.g. Turing)

“Computational Science”Missing at Most Universities

Page 9: PPT

Filling the “computational science” gap: IIC

Problem-driven approach…focusing effort on solving problems that will have greatest impact

& educational valueCollaborative projects

…combining disciplinary knowledge with computer science expertise

Interdisciplinary effort…to ensure that best practices are shared across fields and that

new tools and methodologies will be broadly applicable

Links with industry…to draw on and learn from experience in applied computation

Institutional funding…to ensure effort is directed towards key needs and not driven

solely by narrow priorities of funding agencies

Page 10: PPT

Where are the optimal “IIC” problems?

Low High

Computer Science Payoff

Dom

ain

Sci

ence

Payoff

Low

HIg

h

“Never Mind” Computer

Science Department

Science Departments

CSDepartments

What is the rightshape for

that boundary?

Page 11: PPT

Visualization Distributed Computing

Databases/ Provenance

Analysis & Simulations

Instrumentation

Physically meaningful combination of diverse data types.

e-Science aspects of large collaborations.

Sharing of data and computational resources and tools in real-time.

Management, and rapid retrieval, of data.

“Research reproducibility” …where did the data come from? How?

Development of efficient algorithms.

Cross-disciplinary comparative tools (e.g. statistical).

Improved data acquisition.

Novel hardware approaches (e.g. GPUs, sensors).

IIC Research Branches( and Projects Draw upon >1 )

V

DC

DB/P AS I

Plus…Educational Programs that bring IIC Science to Harvard students, and to the public at large.

Page 12: PPT

Education is central to IIC’s mission

At Harvard:

Undergraduate & graduate courses focused on “data-intensive science”

New graduate certificate program, within existing Ph.D. programs

Research opportunities at undergraduate, graduate, and postdoctoral levels

Beyond Harvard:

New museum, highlighting the kind of science done at the IIC

Page 13: PPT

Image & Meaning Collaboration

IIC Seminar Series at Harvard

Astronomical Medicine (IIC/CfA/HMS/MGH/BWH-SPL)

1st Call for Ideas (deadline was 3/15/06)

V

I

IIC’s First Activities(2005-)

V

DC DB/P AS I

V

DC DB/P AS

V

DC DB/P AS I

Page 14: PPT

“Image and Meaning”

“I-M”=Working group of scientists, computer scientists, graphic artists, writers, publishers, designers organized and led by Felice Frankel, now at IIC!

Goal: To increase both scientists understanding of their own data, and the public’s understanding of scientists’ findings, through graphical display.

Activities: Large conferences at MIT in 2001 and Getty Center in

2005.Smaller “IM2.x” local workshops throughout 2006-7,

including @ IIC.Upcoming IM/SIGGRAPH, in conjunction with SIGGRAPH

2007.Online community to be hosted by IIC, beginning later

this year. (Social Network model.)

Page 15: PPT

Seminar Sampler (Fall 2005-Spring 2006)

Jim Reese How to Build Google in Your Spare Time

Ian Foster Service-Oriented Science

Volker Springel/Nick Holliman

Numerical Cosmology & 3D Viz

Tim Kaxiras Multi-Scale Modeling

Anne Trefethen UK e-Science

Carl Kesselman Emergence of Cyberinfrastructure

Panel on CS & Visual Depiction (Frankel, Rheigans, Durand, Pfister)

Jim Hendler Science & the Semantic Web

Mark Green Building a Grid-enabled Gateway for Science & Engineering

Roy Williams Virtual Observatory as a Model for Information Sharing

Andy van Dam/Anne Spalter

Digital Visual Literacy

Pete Eltgroth Profiles in Supercomputing

Luc Moreau Provenance

Curtis Wong Interactive Media

Eric Klopfer Games, Simulation & Learning

Jim Myers yesterday! Cyberenvironments

Phil Campbell Future of e-Publishing

And more… Grid, Agile Methods, Array-based Databases, Bio & Neuro informatics, Clinical Applications in Autism Research, Astronomical Medicine…

Page 16: PPT

Atomistic Modeling of Biomolecular Function

Multiscale Hemodynamics

Gene Pattern + The Virtual Data Center

Medical Treatment Outcomes Online

Enhanced Viz/Analysis Tools for Archaeo/Geo/Seismology

Spatial Ontology Mapping (Community-based)

Knowledge Ecology of Science (Peer-to-Peer Collaboration Networks)

Framework for Multimodal Studies in Genetics, Biology & the Mind

Connectional Analysis of Synaptic Circuitry in the Mammalian Nervous System (The “Connectome”)

LHC/LSST/MWA Consortium for Data-Intensive Science

A Portal for the National Virtual Observatory

Time-Series Research Collaborative

Responses to 1st IIC Call for IdeasV

DC DB/P AS

V

DC DB/P AS

V

DC DB/P AS

DC DB/P I

V

DC DB/P AS I

V

DC

DB/PDC

V

DC DB/P AS

V

DC DB/P AS I

V

DC DB/P AS I

V

DC

V

DC DB/P AS

V

I

Page 17: PPT

Building the Best (Startup) Program

Visualization Distributed Computing

Databases/ Provenance

Analysis & Simulations

Instrumentation

V

DC

DB/P AS I

Project 1Project 1Project 1Project 1

Page 18: PPT

Building the Best (Startup) Program

Visualization Distributed Computing

Databases/ Provenance

Analysis & Simulations

Instrumentation

V

DC

DB/P AS I

Project 1Project 1Project 1Project 1Project 2Project 2Project 2Project 2

Project 3Project 3Project 3Project 3

Page 19: PPT

Atomistic Modeling of Biomolecular Function

Multiscale Hemodynamics

Gene Pattern + The Virtual Data Center

Medical Treatment Outcomes Online

Enhanced Viz/Analysis Tools for Archaeo/Geo/Seismology

Spatial Ontology Mapping (Community-based)

Knowledge Ecology of Science (Peer-to-Peer Collaboration Networks)

Framework for Multimodal Studies in Genetics, Biology & the Mind

Connectional Analysis of Synaptic Circuitry in the Mammalian Nervous System (The “Connectome”)

LHC/LSST/MWA Consortium for Data-Intensive Science

A Portal for the National Virtual Observatory

Time-Series Research Collaborative

Now…V

DC DB/P AS

V

DC DB/P AS

V

DC DB/P AS

DC DB/P I

V

DC DB/P AS I

V

DC

DB/PDC

V

DC DB/P AS

V

DC DB/P AS I

V

DC DB/P AS I

V

DC

V

DC DB/P AS

V

I

planning grant

planning grant

looking into collab around 3D displays

ongoing discussion w/Harvard librarians

integrated into Data-Intensive Project

Page 20: PPT

2006-7 Project Portfolio

The Connectome

+Astronomical Medicine

Computational Framework for Neuroinformatics and Genetics

Data-Intensive Science and High Capacity Scientific Databases

Genepattern and the Virtual Data Center (VDC)

National Virtual Observatory Portal

+Envisioning Science Program

Page 21: PPT

“Astronomical Medicine”

“Astronomical Medicine”

Brigham & Women’s Hospital, Surgical Planning Lab

Massachusetts General Hospital, Martinos Center

Harvard-Smithsonian Center for Astrophysics

IIC

Present Team:Alyssa Goodman (IIC & CfA, Co-I)Michael Halle (IIC & BWH, Co-I)Douglas Alan (IIC, Sen. Scientific S/W Engineer)Michelle Borkin (IIC, Res. Assoc.)Jens Kauffmann (IIC & CfA, postdoc)

Demo MovieDemo Movie

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Page 22: PPT

The “Connectome”:Wiring Diagram for a Complete Brain

Circuit (Connectional Analysis of Synaptic Circuitry in the Mammalian

Nervous System)3D images from electron-microsope images of serial sections (slices)

– Large volumes studies: up to 500 mm cubes– High resolution: 5nm x-y; 50 nm in z (105 x 105 x 104=1014

voxels)– Large datasets: 10-100 TB

Potentially intractable computationally w/o a hierarchical approach

Start with the large, dominant pathways: The biggest wires and the biggest excitatory connections.

Use this as scaffolding to then solve other pathways: inhibition, lateral connections, feedback.

V

DC DB/P AS I

Page 23: PPT

Virtual Observatory Portal

V

DC

Page 24: PPT

Virtual Observatory Portal?

Page 25: PPT

Virtual Observatory Portal?

Page 26: PPT

Virtual Observatory Portal

Default values are shown in green

Data on: One object One Region A list of objects A list of regions

I want: Spectra Images Catalogs (click all that apply)

I want to: Use VO tools to browse data Download data to local computer

Would you like help writing a script to do your query? Yes or No

Continue

Virtual Observatory Portal

V

DC

Page 27: PPT

A Computational Framework for Neuroinformatics and Genetics

Goal: Create an integrated framework for simultaneous analysis and reproducible retrieval of multimodal data in structural & functional brain imaging and genetics.

Collaboration Amongst Several HMS Hospitals & Departments:

NeurologyRadiologyPsychologyMolecular GeneticsIIC

Page 28: PPT

Core Imaging Methodologies

Cortical Thickness AD vs. Controls

A Computational Framework forMultimodal Studies in GENETICS, BIOLOGY, AND THE MIND

Family history pedigree software toolkit

Histological Correlates of AD

Patients

Normal Controls

AnteriorPosterior

Superior

Right Amygdala

Common

Topology differences in cocaine addiction

V

DC DB/P AS I

Lab 1

Lab 1 Lab

3Lab 3

Lab 2

Lab 2

Lab 4

Lab 4

Lab 5

Lab 5Computational FrameworkComputational FrameworkComputational FrameworkComputational Framework

Page 29: PPT

Computational FrameworkComputational FrameworkComputational FrameworkComputational Framework

A Computational Framework forMultimodal Studies in GENETICS, BIOLOGY, AND THE MIND

V

DC DB/P AS I

“An Entire Disease or

Condition of the Brain”

Page 30: PPT

Data-Intensive Science

Collaboration Amongst:Physics DepartmentDEASHarvard-Smithsonian Center for

AstrophysicsIIC & Harvard CIO’sCERN & NSF

Goal: Create powerful HPC / Grid capabilities in data-intensive science, advanced analytical algorithms in Astronomy & Physics, & advanced research in scientific VLDBMS

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 31: PPT

Data-Intensive Science

Of interest to ABCD:Tier 2 Grid Node & Staff will come to HarvardHPC Real-Time Computing capability for MWA

projectLarge, fast, storage for Pan-Starrs ProjectSeeds of a “Center” for Time-Series Analysis

Page 32: PPT

Gene Pattern - Virtual Data Center

Collaboration Amongst:

The Broad Institute of Harvard & MIT

Harvard Institute for Quantitative Social Science (IQSS)

IIC

Goal: Integrate biomedical computational workflow engines with a statistical framework and canonical data repository originally developed for social science research

Page 33: PPT

Multiscale Hemodynamics

Goal: Build an accurate multiscale simulation of hemodynamics to enable significant advances in fundamental knowledge of blood flow and treatment of related diseases.

Collaboration Amongst:DEAS/Applied MathematicsPhysicsDEAS/Computer ScienceHMS/CardiologyDEAS/Chemical EngineeringIIC

Page 34: PPT

The ‘Envisioning Science’ Program

Mission: To enable Scientists, computer scientists, graphic designers, journalists, and editors to co-develop new methods of scientific communication and education focused on scientific images.

Collaboration Amongst:Faculty of Arts and Sciences (Felice

Frankel!!)Harvard Medical School

MIT, NSF, Apple…IIC

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

OngoingImage and Meaning Conference

seriesPicturing to Learn program

Page 35: PPT

Agenda

What is IIC? (“Filling the Gap”)

Where did it come from? (A Story)

What have we done so far? (Startup Mode)

What are we about to do? (Projects, Hiring Plans)

What do we hope to do? (Long-term Goals)

Page 36: PPT

IIC will evolve over three phases

Phase I

2005-08Timing

IIC staffing level, combo of• new faculty• senior scientists• admin staff

Number of projects

Educational mission• New courses offered• Outreach programs

Other key milestones

Phase II

2008-10

Phase III

2011+

Total ~25 to ~100

~5-7 to ~15-18

New courses to museum

Evaluation schedule (internal, external committees)

Page 37: PPT

Organizational Challenges

Result of “Allston” Science & Technology Task Force

IIC intended to be a “University” (not a single school)

initiative

FAS (Faculty of Arts & Science) Constraints

Faculty Appointments

Non-Faculty Appointments

Startup Space

“Chicken-and-Egg” Problem with Recruiting

Good, but not certain, Funding Prospects

Role of DEAS Computer Science

Page 38: PPT

Challenges for Discussion• How will IIC research enable better collaboration amongst

geographically distributed researchers?• What are the best technologies for visualizing enormous data

sets?• How can "human-in-the-loop" software, where we admit that

humans are better than computers at many (particularly graphical) tasks, best be created and used

• Are "mashups," where many software packages are "mashedup" together the way of the future, or is it reasonable to strive for "perfect" standalone software packages?

• If mashups ultimately prevail, what is the business model for developing them?

• How do we best appoint new IIC faculty at Harvard, given that their work often does not fit within existing departmental boundaries?

• How can IIC best partner with industry to accomplish goals of mutual interest, and which of those goals are paramount?

Page 39: PPT

“Challenges”

Low High

Computer Science Payoff

Dom

ain

Sci

ence

Payoff

Low

HIg

h

“Never Mind” Computer

Science Department

Science Departments

CSDepartments

Will CS/DEASuse slots for

these people?How big is that overlap?

Will departments hire“computationalists” with

regular slots?How big is this overlap?

How do we give Seniornon-faculty similar statureto faculty? (e.g. P.I. rights,

job security)

Page 40: PPT

IIC: Mission

The Institute for Innovative Computing (IIC) will make Harvard a world leader in the innovative and creative use of computational resources to address forefront scientific problems.

We will focus on developing capabilities that are applicable to multiple disciplines, by undertaking specific, well-defined projects, thereby developing tools and approaches that can be generalized and shared.

We will foster the flow of ideas and inventions along the continuum from basic science to scientific computation to computational science to computer science.

We will train a next generation of creative and computationally capable scientists, build linkages to industry, and communicate with the public at large.

Page 41: PPT

The Initiative in Innovative Computing at Harvard

Alyssa A. GoodmanIIC Director & Prof. of Astronomy

Page 42: PPT

Sample Long Term Goal“3D Data Desk”

Demo, using data from http://www.electoral-vote.com/2004/info/president.csv)

Perseus file