A new collaborative scientific initiative at Harvard.
One-Slide IIC
Proposal-driven, from within Harvard“Projects” focus on areas where computers are key to new science;
widely applicable results
Technical focus “Branches” Instrumentation Databases & Provenance Analysis & Simulations Visualization Distributed Computing (e.g. GRID, Semantic Web)
Matrix organization: “Projects” by “Branches”
Education: Train Future Consumers & Producers of Computational
Science
Proposal-driven, from within Harvard“Projects” focus on areas where computers are key to new science;
widely applicable results
Technical focus “Branches” Instrumentation Databases & Provenance Analysis & Simulations Visualization Distributed Computing (e.g. GRID, Semantic Web)
Matrix organization: “Projects” by “Branches”
Education: Train Future Consumers & Producers of Computational
Science
Goal: Fill the void in, highly value, and learn from,
the emerging field of “computational science.”
Goal: Fill the void in, highly value, and learn from,
the emerging field of “computational science.”
“Astronomical Medicine”
A joint venture of FAS-Astronomy & HMS/BWH-Surgical Planning Lab; Work shown here is from the 2005 Junior Thesis of Michelle Borkin, Harvard College.
Filling the “Gap” between Science and Computer Science
Increasingly, core problems in science require computational solution
Typically hire/“home grow” computationalists, but often lack the expertise or funding to go beyond the immediate pressing need
Focused on finding elegant solutions to basic computer
science challenges
Often see specific, “applied” problems as outside their
interests
Scientific disciplines
Computer Science departments
“Workflow” & “Continuum”
Workflow Examples Astronomy Public Health
““Collect”Collect” TelescopeTelescope Microscope, Microscope,
Stethoscope, SurveyStethoscope, Survey
COLLECTCOLLECT ““National Virtual National Virtual Observatory”/Observatory”/
COMPLETECOMPLETE
CDC WonderCDC Wonder
““Analyze”Analyze” Study the density Study the density structure of a star-structure of a star-forming glob of gasforming glob of gas
Find a link between Find a link between one factory’s chlorine one factory’s chlorine
runoff & diseaserunoff & disease
ANALYZEANALYZE Study the density Study the density structure of structure of allall star- star-
forming gas in…forming gas in…
Study the toxic Study the toxic effects of chlorine effects of chlorine runoff runoff in the U.Sin the U.S..
““Collaborate”Collaborate” Work with your student Work with your student
COLLABORATECOLLABORATE Work with 20 people in 5 countries, in real-Work with 20 people in 5 countries, in real-timetime
““Respond”Respond” Write a paper for a Journal.Write a paper for a Journal.
RESPONDRESPOND Write a paper, the quantitative results of Write a paper, the quantitative results of which are shared globally, digitally.which are shared globally, digitally.
IIC branches address shared “workflow” challenges
Challenges common to data-intensive science
• Data acquisition
• Data processing, storage, and access
• Deriving meaningful insight from large datasets
• Maximizing understanding through visual representation
• Sharing knowledge and computing resources across geographically dispersed researchers
Instrumentation
Analysis & Simulations
Databases/ Provenance
Distributed Computing
Visualization
IIC branches
Continuum
“Pure” Discipline Science
(e.g. Einstein)
“Pure” Computer Science
(e.g. Turing)
“Computational Science”Missing at Most Universities
IIC Organization: Research and Education
Assoc Dir, Instrumentation
Assoc Dir, Visualization
Assoc Dir, Analysis & Simulation
Provost
IIC DirectorAssoc Provost
Dir of Admin & Operations
Project 1(Proj Mgr 1)
Project 2(Proj Mgr 2)
Project 3(Proj Mgr 3)
Dir of Education &Outreach
Etc.
CIO (systems)
Knowledgemgmt
Education &Outreach staff
Dean, Physical Sciences
Dir of Research
Assoc Dir, Databases/Data
Provenance
Assoc Dir, Distributed Computing
Barnard’s Perseus
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.
COMPLETE/IRAS Ndust
IRAS Ndust
H-
em
issi
on,W
HA
M/S
HA
SS
A S
urve
ys (
see
Fin
kbei
ner
2003
)
H
2MASS/NICER Extinction
Numerical Simulation of Star Formation
Bate, Bonnell & Bromm 2002 (UKAFF)
•MHD turbulence gives “t=0” conditions; Jeans mass=1 Msun
•50 Msun, 0.38 pc, navg=3 x 105 ptcls/cc
•forms ~50 objects
•T=10 K
•SPH, no B or •movie=1.4 free-fall times
QuickTime™ and aCinepak decompressor
are needed to see this picture.
Goal:Statistical Comparison of “Real” and “Synthesized” Star Formation
Figure based on work of Padoan, Nordlund, Juvela, et al.Excerpt from realization used in Padoan & Goodman 2002.
Spectral Line Observations
Measuring Motions: Molecular Line Maps
Alves, Lada & Lada 1999
Radio Spectral-Line Survey
Radio Spectral-line Observations of Interstellar Clouds
Velocity from Spectroscopy
1.5
1.0
0.5
0.0
-0.5
Inte
nsit
y
400350300250200150100
"Velocity"
Observed Spectrum
All thanks to Doppler
Telescope Spectrometer
Barnard’s Perseus
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
COMPLETE/FCRAO W(13CO)
QuickTime™ and aYUV420 codec decompressor
are needed to see this picture.
“Astronomical Medicine”
Excerpts from Junior Thesis of Michelle Borkin (Harvard College); IIC Contacts: AG (FAS) & Michael Halle (HMS/BWH/SPL)
IC 348
IC 348
“Astronomical Medicine”
“Astronomical Medicine”
“Astronomical Medicine”
After “Medical Treatment”After “Medical Treatment”Before “Medical Treatment”Before “Medical Treatment”
3D Slicer Demo
IIC contacts: Michael Halle & Ron Kikinis
IIC Research Branches
Improved data acquisition.
Novel hardware approaches (e.g. GPUs, sensors).
Development of efficient algorithms.
Cross-disciplinary comparative tools (e.g. statistical).
Management, and rapid retrieval, of data.
“Research reproducibility” …where did the data come from? How?
e-Science aspects of large collaborations.
Sharing of data and computational resources and tools in real-time.
Physically meaningful combination of diverse data types.
InstrumentationAnalysis & Simulations
Databases/ Provenance
Distributed Computing
Visualization
IIC projects will bring together IIC experts from relevant branches with discipline scientists to address a pressing computing challenge facing the discipline, that has broad application
3D Slicer3D Slicer
Distributed Computing & Large Databases: Large Synoptic Survey TelescopeOptimized for time domainOptimized for time domain
scan modescan mode
deep modedeep mode
7 square degree field7 square degree field
6.5m effective aperture6.5m effective aperture
24th mag in 20 sec24th mag in 20 sec
> 5 Tbyte/night> 5 Tbyte/night
Real-time analysisReal-time analysis
Simultaneous multiple science goalsSimultaneous multiple science goals
IIC contact: Christopher Stubbs (FAS)
Relative optical survey power
0
40
80
120
160
Figure of Merit
LSST SNAP Pan-STARRS
Subaru CFHT SDSS MMT
Time (x10)
Stellar
Galactic (x2)
based on A= 270 LSST design
Astronomy High Energy Physics
LSST SDSS 2MASS MACHO DLS BaBar Atlas RHIC
First year of operation
2011 1998 2001 1992 1999 1998 2007 1999
Run-time data rate to storage (MB/sec)
5000 Peak
500 Avg
8.3
1
1
2.7
60 (zero-suppressd)
6*
540*
120* (’03)250* (’04)
Daily average datarate (TB/day)
20 0.02 0.016 0.008 0.012 0.6 60.0 3 (’03)10 (’04)
Annual data store(TB)
2000 3.6 6 1 0.25 300 7000 200 (’03)500 (’04)
Total data store capacity (TB)
20,000(10 yrs)
200 24.5 8 2 10,000 100,000 (10 yrs)
10,000 (10 yrs)
Peak computational load (GFLOPS)
140,000 100 11 1.00 0.600 2,000 100,000 3,000
Average computationalload (GFLOPS)
140,000 10 2 0.700 0.030 2,000 100,000 3,000
Data release delayacceptable
1 day moving
3 months static
2 months
6 months
1 year 6 hrs (trans)
1 yr (static
)
1 day (max)
<1 hr (typ)
Few days 100 days
Real-time alert of event
30 sec none none <1 hour 1 hr none none none
Type/number of processors
TBD 1GHzXeon
18
450MHz Sparc
28
60-70MHz Sparc
10
500MHz
Pentium5
Mixed/
5000
20GHz/
10,000
Pentium/
2500
Challenges at the LHC
For each experiment (4 total):
10’s of Petabytes/year of data logged
2000 + Collaborators
40 Countries
160 Institutions (Universities, National Laboratories)
CPU intensive
Global distribution of data
Test with « Data Challenges »
CPU v. Collab.
10
100
1,000
10,000
100,000
0 500 1000 1500 2000 2500
Collaboration Size
CPU CPU v. Collab.
Earth Simulator
Atmospheric Chemistry Group
LHC Exp.
Astronomy
Grav. Wave
Nuclear Exp.
Current accelerator Exp.
CPU vs. Collaboration Size
interactivephysicsanalysis
batchphysicsanalysis
batchphysicsanalysis
detector
event summary data
rawdata
eventreprocessing
eventreprocessing
eventsimulation
eventsimulation
analysis objects(extracted by physics topic)
Data Handling and Computation for
Physics Analysisevent filter(selection &
reconstruction)
event filter(selection &
reconstruction)
processeddata
les.
rob
ert
son
@ce
rn.c
h
CERN
Workflow
a.k.a. The Scientific Method (in the Age of the Age of High-Speed Networks, Fast Processors, Mass Storage, and Miniature Devices)
IIC contact: Matt Welsh, FAS