18
011 01100 1010011 00101000 1110100011 001001110110110 100101010001011101 0010010011010101000 101111010011000111110011 01011010110001110101101010 1110110111101101010010110100 01111101010101010001101001001000 Exploring Space in Cyberspace: Cyber-Enabled Research and Discovery in Astronomy S. George Djorgovski (Caltech) CDI Workshop, Seattle, Nov. 2007 Overview Astronomy in the era of information abundance Exponential growth in data volume and complexity The Virtual Observatory concept and status A domain-specific, community-wide, distributed, open framework for science with massive and complex data sets Technology-enabled, but science-driven Examples of VO science drivers Exploration of parameter spaces Exploration of the time domain: distributed analysis and mining of massive data streams Some general comments and musings on Cyber-science What is really new here? What are the important trends? The enhanced science and technology synergies

Exploring Space in Cyberspace: Cyber-Enabled …depts.washington.edu/amtas/events/nsf-cdi/presentations/...Galactic Center Region (a tiny portion) 2MASS NIR Image Digital Sky Surveys

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Exploring Space in Cyberspace: Cyber-Enabled …depts.washington.edu/amtas/events/nsf-cdi/presentations/...Galactic Center Region (a tiny portion) 2MASS NIR Image Digital Sky Surveys

011

01100

1010011

00101000

1110100011

001001110110110

100101010001011101

0010010011010101000

101111010011000111110011

01011010110001110101101010

1110110111101101010010110100

01111101010101010001101001001000

Exploring Space in Cyberspace:

Cyber-Enabled Research and

Discovery in Astronomy

S. George Djorgovski

(Caltech)

CDI Workshop,

Seattle,

Nov. 2007

Overview• Astronomy in the era of information abundance

– Exponential growth in data volume and complexity

• The Virtual Observatory concept and status

– A domain-specific, community-wide, distributed, open

framework for science with massive and complex data sets

– Technology-enabled, but science-driven

• Examples of VO science drivers

– Exploration of parameter spaces

– Exploration of the time domain: distributed analysis and

mining of massive data streams

• Some general comments and musings on Cyber-science

– What is really new here? What are the important trends?

– The enhanced science and technology synergies

Page 2: Exploring Space in Cyberspace: Cyber-Enabled …depts.washington.edu/amtas/events/nsf-cdi/presentations/...Galactic Center Region (a tiny portion) 2MASS NIR Image Digital Sky Surveys

Galactic Center Region (a tiny portion) 2MASS NIR Image

Digital Sky Surveys

• The dominant source of data in astronomy today,

typically several tens of TB each, ~ 108 - 109

sources detected, ~ 102 - 103 attributes per

source; all wavelengths, radio to !-ray

– Examples: SDSS, Palomar surveys, 2MASS, …

• A single survey feeds a broad range of science,

from statistical studies of major constituents of

the universe, to discovery of rare types of objects

• Federated in the Virtual Observatory framework

• Current surveys are mainly single-snapshot; the

next generation will be synoptic (multi-pass),

opening the time-domain astronomy (cosmic

cinematography); Peta-scale data sets

– Examples: PanSTARRS, LSST; SKA; etc.

Page 3: Exploring Space in Cyberspace: Cyber-Enabled …depts.washington.edu/amtas/events/nsf-cdi/presentations/...Galactic Center Region (a tiny portion) 2MASS NIR Image Digital Sky Surveys

Panchromatic Views

of the

Universe

Visible + X-ray

Crab Star forming complex

Radio + IR

Understanding

of complex

phenomena

requires

complex data

Radio

Far-InfraredVisible

Data Fusion

! A More

Complete,

Less Biased

Picture

Theoretical Simulations Are Also Becoming MoreComplex and Generate Many TB’s of Data

Structure formation in the Universe Supernova explosions

Numerical simulations are not just a weak substitute for the analytical

theory - they are an inevitable methodology to study theoretically

many complex phenomena, e.g., star or galaxy formation, etc.

Page 4: Exploring Space in Cyberspace: Cyber-Enabled …depts.washington.edu/amtas/events/nsf-cdi/presentations/...Galactic Center Region (a tiny portion) 2MASS NIR Image Digital Sky Surveys

An Example of Where We Are Heading

The exponential growth of

data volume (and also

complexity, quality) driven

by the exponential growth

in detector and computing

technology1970

19751980

19851990

19952000

0.1

1

10

100

1000

CCDs Glass

… but our understanding

of the universe increases much more slowly!

Data ! Knowledge ?

• Large digital sky surveys are becoming the dominant

source of data in astronomy: ~ 10-100 TB/survey (soon PB),

~ 106 - 109 sources/survey, many wavelengths…

• Data sets many orders of magnitude larger, morecomplex, and more homogeneous than in the past

doubling t ! 1.5 yrs

(from A. Szalay)

Page 5: Exploring Space in Cyberspace: Cyber-Enabled …depts.washington.edu/amtas/events/nsf-cdi/presentations/...Galactic Center Region (a tiny portion) 2MASS NIR Image Digital Sky Surveys

The Virtual Observatory Concept• Astronomy community’s response to the scientific and

technological challenges posed by the exponential growth of data

sets and data complexity

– Technology-enabled, but science-driven: harness the IT

advances in service of astronomy

• A complete, dynamical, distributed, web-based, open researchenvironment for astronomy with massive and complex data sets

– Provide content (data, metadata)

services, standards, and

analysis/compute services

– Federate the existing + forthcoming

digital sky surveys and archives,

facilitate data inclusion and

distribution

– Develop and provide data

exploration and discovery tools

SurveyTelescope

Archive

Follow-UpTelescope

Results

Target Selection

Data Mining

From Traditional to Survey to VO-Based Science

Highly successful and increasingly prominent, but inherently

limited by the information content of individual surveys …

What comes next, beyond survey science is the VO science

Another Survey/Archive?

Data Analysis

Results

Telescope

Traditional: Survey-Based:

Page 6: Exploring Space in Cyberspace: Cyber-Enabled …depts.washington.edu/amtas/events/nsf-cdi/presentations/...Galactic Center Region (a tiny portion) 2MASS NIR Image Digital Sky Surveys

SurveysObservatories

Missions

Surveyand

MissionArchives Follow-Up

Telescopesand

MissionsData Services:

Data discovery

Warehousing

Federation

Standards…Compute Services:

Data Mining

and Analysis,

Statistics,

Visualization…

Networking

Digitallibraries

Primary Data Providers

NVO

SecondaryData

Providers

Nat’l Virtual Observatory: A Systemic View

Numerical Sim’s

User Community

International

VO’s

Virtual

Observatory

Is Real!

http:// ivoa.net

http://www.euro-vo.org

http://us-vo.org

Page 7: Exploring Space in Cyberspace: Cyber-Enabled …depts.washington.edu/amtas/events/nsf-cdi/presentations/...Galactic Center Region (a tiny portion) 2MASS NIR Image Digital Sky Surveys

VO as a New Research Environment

• The VO is not yet another data center, archive, mission, or a

traditional project It does not fit into any of the usual

structures today

– It is inherently distributed, and web-centric

– It is based on a rapidly developing technology (IT/CS)

– It transcends the traditional boundaries between differentwavelength regimes, agency domains

– It has an unusually broad range of constituents and interfaces

– It is inherently multidisciplinary

– It is inherently trans-national in its reach

• The VO represents a novel type of a scientific organization for

the era of information abundance

– Many other fields are building VOs of their own

– They are always discipline-based, not institution-based

• Statistical astronomy done right

– Precision cosmology, Galactic structure, stellar astrophysics …

– Discovery of significant patterns and multivariate correlations

– Poissonian errors unimportant

• Systematic exploration of the observable parameter spaces

– Searches for rare or unknown types of objects and phenomena

– Low surface brightness universe, the time domain …

• Multi-wavelength data fusion to disentangle complexprocesses and superpositions

– e.g., interpretation of the precision CMBR measurements

• Confronting massive numerical simulations with massive

data sets

+ things we have not thought of yet …

Virtual Observatory Enabled Science

Page 8: Exploring Space in Cyberspace: Cyber-Enabled …depts.washington.edu/amtas/events/nsf-cdi/presentations/...Galactic Center Region (a tiny portion) 2MASS NIR Image Digital Sky Surveys

Understanding the CMBR ForegroundsIntegrated SZ

Grav. LensingInteg. Sachs-Wolfe

Galaxies (SF)Radio Sources

Galactic ThermalGal. Nonthermal

CMB Signal

Exploration of Parameter Spaces

• How many different types ofobjects are there?

– Which ones are identifiablewith known, physically distincttypes (e.g., stars, galaxies,quasars, etc.)?

• Are there rare and/orpreviously unknown classes,seen as outliers?

– Are there intermediate ortransition types?

– Are there negative clusters?

– Anomalies possibly indicativeof problems with the data?

• Are there new multivariatecorrelations?

Page 9: Exploring Space in Cyberspace: Cyber-Enabled …depts.washington.edu/amtas/events/nsf-cdi/presentations/...Galactic Center Region (a tiny portion) 2MASS NIR Image Digital Sky Surveys

An Example: Discoveries of High-Redshift

Quasars and Type-2 Quasars in DPOSS

Known, astrophysically interesting but rare types of objects,

with a known or predictable parameter space signature

High-z QSOs

Type-2 QSOs

Color-color parameterspace used for selection Spectroscopic identification

NormalStars

Peculiar types of quasars:Rare Types of ObjectsDiscovered as Outliers in

a Color Parameter Space

DQ White Dwarf

Highly unusual CV

(Fan et al., SDSS; Djorgovski et al., DPOSS)

Peculiar types of stars:

Page 10: Exploring Space in Cyberspace: Cyber-Enabled …depts.washington.edu/amtas/events/nsf-cdi/presentations/...Galactic Center Region (a tiny portion) 2MASS NIR Image Digital Sky Surveys

Dwarf Planets, Flying Rocks, and Snowballs

Dwarf planets and KBOs

Sedna,

Xena, …

?

QuauarM. Brown

et al.

NEAT,

Catalina,

etc.

Tunguska

Killer Asteroids

Flaring stars Novae, Cataclysmic Variables

A Rich Variety of Time-Domain Phenomena

Supernovae

Gravitational Microlensing Accretion to SMBHsGamma-Ray Bursts

Page 11: Exploring Space in Cyberspace: Cyber-Enabled …depts.washington.edu/amtas/events/nsf-cdi/presentations/...Galactic Center Region (a tiny portion) 2MASS NIR Image Digital Sky Surveys

Blazars: Accelerators in the Sky

TeV !-ray

Detections

They are quasars where we are looking straight down the relativistic

jet (" ~ 10 - 100). Instabilities and shocks produce strong variability

• Known sources of !-rays (up to a few

TeV), and probable sources of ultra-

high energy cosmic rays (up to ~ 1021

eV ~ 108! LHC !)

– The future of particle (astro)physics?

• Probes of relativistic physics, AGN,

and cosmic star formation history

Donald Rumsfeld’s Epistemology

There are known knowns,

There are known unknowns, and

There are unknown unknowns

Page 12: Exploring Space in Cyberspace: Cyber-Enabled …depts.washington.edu/amtas/events/nsf-cdi/presentations/...Galactic Center Region (a tiny portion) 2MASS NIR Image Digital Sky Surveys

And Some Mysteries…

Megaflares in normal stars !An example from DPOSS: A normal, main-

sequence star which underwent an outburst by

a factor of > 300 (orders of magnitude more

than the Solar flares). The cause, duration, and

frequency of these bursts is currently unknown

Archival optical transients

Seen in many surveys

(DPOSS, DLS, PQ, SN

surveys, …). Their physical

nature is unknown

!

The Palomar-Quest Event Factory

R

I

tonight baselineDetect ~ 1 - 2 !106 sources

per half-night scan

Find ~ 103 apparent

transients (in the data)

Identify ~ 2 - 4 !102 real

transients (on the sky)

Identify ~ 1 - 10 possible

Astrophysical transients

Compare withthe baseline sky

Remove instrum.artifacts

Removeasteroids

Classification and follow-up

Page 13: Exploring Space in Cyberspace: Cyber-Enabled …depts.washington.edu/amtas/events/nsf-cdi/presentations/...Galactic Center Region (a tiny portion) 2MASS NIR Image Digital Sky Surveys

The VOEventNet Project

• A telescope sensor network with a feedback

• Scientific measurements spawning other measurements and data

analysis in the real time (time scales ~ minutes/hours/days)

• Immediate web-based dissemination and publishing

• Please see http://voeventnet.org

P48PQ Event

FactoryVOEN Engine

P60

Raptor

ParitelWeb Event

Archive

Externalarchives

Compute resources Robotictelescopenetwork

Follow-up obs.

PI: R. WilliamsSpons. NSF/DDDAS

Broader and Societal Benefits of a VO

• Professional Empowerment: Scientists and studentsanywhere with an internet connection would be able to doa first-rate science A broadening of the talent poolin astronomy, democratization of the field

• Interdisciplinary Exchanges:

– The challenges facing the VO are common to mostsciences and other fields of the modern human endeavor

– Intellectual cross-fertilization, feedback to IT/CS

• Education and Public Outreach:

– Unprecedented opportunities in terms of the content,broad geographical and societal range, at all levels

– Astronomy as a magnet for the CS/IT education

“Weapons of Mass Instruction”

Page 14: Exploring Space in Cyberspace: Cyber-Enabled …depts.washington.edu/amtas/events/nsf-cdi/presentations/...Galactic Center Region (a tiny portion) 2MASS NIR Image Digital Sky Surveys

VO Education and Public Outreach

Google Sky: uses DSS,

SDSS, HST data, etc.,

for easy sky browsing

Soon also: Microsoft’s

World Wide Telescope

Transformation and Synergy• We are entering the second phase of

the IT revolution: the rise of the

information/data driven computing– The impact is like that of the industrial

revolution and the invention of the

printing press, combined

• All science in the 21st century is becoming cyber-science (aka

e-science) - and with this change comes the need for a new

scientific methodology

• The challenges we are tackling:

– Management of large, complex, distributed data sets

– Effective exploration of such data ! new knowledge

– These challenges are universal

• There is a great emerging synergy of the computationally

enabled science, and the science-driven IT

Page 15: Exploring Space in Cyberspace: Cyber-Enabled …depts.washington.edu/amtas/events/nsf-cdi/presentations/...Galactic Center Region (a tiny portion) 2MASS NIR Image Digital Sky Surveys

Information Technology ! New Science

• The information volume grows exponentially

Most data will never be seen by humans!

The need for data storage, network, database-relatedtechnologies, standards, etc.

• Information complexity is also increasing greatly

Most data (and data constructs) cannot becomprehended by humans directly!

The need for data mining, KDD, data understandingtechnologies, hyperdimensional visualization, AI/Machine-assisted discovery …

• We need to create a new scientific methodology on the basisof applied CS and IT

• VO is the framework to effect this for astronomy

A Modern Scientific Discovery Process

Data Gathering (e.g., from sensor networks, telescopes…)

Data Farming:Storage/ArchivingIndexing, SearchabilityData Fusion, Interoperability

Data Mining (or Knowledge Discovery in Databases):

Pattern or correlation searchClustering analysis, automated classificationOutlier / anomaly searchesHyperdimensional visualization

Data Understanding

New Knowledge

} Database

Technologies

KeyTechnicalChallenges

KeyMethodologicalChallenges

+feedback

Page 16: Exploring Space in Cyberspace: Cyber-Enabled …depts.washington.edu/amtas/events/nsf-cdi/presentations/...Galactic Center Region (a tiny portion) 2MASS NIR Image Digital Sky Surveys

The key role of data analysis is to replace the

raw complexity seen in the data with a reduced

set of patterns, regularities, and correlations,

leading to their theoretical understanding

However, the complexity of data sets and

interesting, meaningful constructs in them is

starting to exceed the cognitive capacity of the

human brain

Universal Challenges:Towards The New Scientific Methodology

• Data farming and harvesting

– Semantic webs, computational and data grids, universal or trans-

disciplinary standards and ontologies …

– Digital scholarly publishing and curation (libraries)… data, metadata, virtual data, hierarchical data products; legacy vs. dynamical; open

vs. proprietary; data, knowledge, and codes; persistency; peer review; web samizdat

vs. officially blessed and supported; mandates; etc., etc.

• Data mining and understanding, knowledge extraction

– Scalable DM algorithms

– Hyperdimensional visualization

– Empirical validation of numerical models

– Computer science as the “new mathematics”

• The art and science of scientific software systems

– Architecture, design, implementation, validation …

Page 17: Exploring Space in Cyberspace: Cyber-Enabled …depts.washington.edu/amtas/events/nsf-cdi/presentations/...Galactic Center Region (a tiny portion) 2MASS NIR Image Digital Sky Surveys

Some Distinguishing Characteristics of

Data/Comp. Enabled (e-)Science

• Data-intensive: massive (TB-scale and beyond) data sets

– Poissonian errors not important, systematics dominate

• Data complexity: multi-wavelength and/or multi-scale and/or

multi-epoch data sets, 100’s or 1000’s parameters per source,

combining imaging, spectroscopy, etc.

– Heterogeneity and visualization are key issues

• Computationally intensive

• Traditional solutions do not scale to the scope of new problems

– Need new tools and scalable algorithms

• Data and computing resources (an experise) are generally

geographically distributed

• Inherently cross-cutting in many ways (CS/Astro, multi-# …)

Some Thoughts on CyberScience

• Enables a broad spectrum of users and contributors

– From large teams, to small teams, to individuals

– Data volume ~ team size, but scientific returns ! f (team size)

– Human talent is distributed very broadly geographically

Open, distributed, web-based nature of new science is a key feature

• Transition from data-poor to data-rich science

– Chaotic ! Organized … regulation vs. creative freedom

– Can we learn to ask a new kind of questions?

• Information is cheap, but expertise is expensive

– Just like the hardware/software situation

• Computer science as the “new mathematics”

– It plays the role in relation to other sciences which mathematics

did in ~ 17th - 20th century

Page 18: Exploring Space in Cyberspace: Cyber-Enabled …depts.washington.edu/amtas/events/nsf-cdi/presentations/...Galactic Center Region (a tiny portion) 2MASS NIR Image Digital Sky Surveys

Summary Comments• Cyber-enabled (computationally and data-enabled) science is a

practical necessity

– Complex problems ! simulations, complex and massive data sets

– Distributed resources (data, facilities, people…) ! virtual scientificorganizations (VO is an example)

• It is IT-enabled, and has a potential to drive transformative

scientific and practical advances

– The key challenge is to think differently (computationally)

– Remember the origins of WWW; now grid, semantic web,knowledge extraction tools, MP/HPC design and apps…

• There is a great deal of methodological commonality between

different fields

– … and this commonality can lubricate some genuine multi- or

interdisciplinary research, with a great discovery potential

– Let’s avoid wasteful replication of efforts, share the tools, methods